mtermvectors接口允许一次获取多个term向量,检索term向量的文档由索引和id指定,但这些文件也可以在请求本身中人为提供。响应包括一个文档数组,其中包含所有获取的termvector,每个元素都具有termvectors API提供的结构。举个例子:
curl -XPOST "http://127.0.0.1:9200/_mtermvectors?pretty" -H "Content-Type:application/json" -d'
{
"docs": [
{
"_index": "twitter",
"_id": "2",
"term_statistics": true
},
{
"_index": "twitter",
"_id": "1",
"fields": [
"message"
]
}
]
}'
返回值为
{
"docs" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"found" : true,
"took" : 0,
"term_vectors" : {
"text" : {
"field_statistics" : {
"sum_doc_freq" : 6,
"doc_count" : 2,
"sum_ttf" : 8
},
"terms" : {
"..." : {
"doc_freq" : 1,
"ttf" : 1,
"term_freq" : 1,
"tokens" : [
{
"position" : 3,
"start_offset" : 21,
"end_offset" : 24,
"payload" : "d29yZA=="
}
]
},
"another" : {
"doc_freq" : 1,
"ttf" : 1,
"term_freq" : 1,
"tokens" : [
{
"position" : 0,
"start_offset" : 0,
"end_offset" : 7,
"payload" : "d29yZA=="
}
]
},
"test" : {
"doc_freq" : 2,
"ttf" : 4,
"term_freq" : 1,
"tokens" : [
{
"position" : 2,
"start_offset" : 16,
"end_offset" : 20,
"payload" : "d29yZA=="
}
]
},
"twitter" : {
"doc_freq" : 2,
"ttf" : 2,
"term_freq" : 1,
"tokens" : [
{
"position" : 1,
"start_offset" : 8,
"end_offset" : 15,
"payload" : "d29yZA=="
}
]
}
}
},
"fullname" : {
"field_statistics" : {
"sum_doc_freq" : 4,
"doc_count" : 2,
"sum_ttf" : 4
},
"terms" : {
"doe" : {
"doc_freq" : 2,
"ttf" : 2,
"term_freq" : 1,
"tokens" : [
{
"position" : 1,
"start_offset" : 5,
"end_offset" : 8,
"payload" : "d29yZA=="
}
]
},
"jane" : {
"doc_freq" : 1,
"ttf" : 1,
"term_freq" : 1,
"tokens" : [
{
"position" : 0,
"start_offset" : 0,
"end_offset" : 4,
"payload" : "d29yZA=="
}
]
}
}
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"found" : true,
"took" : 0,
"term_vectors" : { }
}
]
}
也可以指定具体的某个索引,如:
curl -XPOST "http://127.0.0.1:9200/twitter/_mtermvectors?pretty" -H "Content-Type:application/json" -d'
{
"docs": [
{
"_id": "2",
"fields": [
"message"
],
"term_statistics": true
},
{
"_id": "1"
}
]
}'
返回结果跟上面的差不多。
如果所有的请求文档都在相同的索引里,参数也是一样的,那么可以更加简单的调用,如:
curl -XPOST "http://127.0.0.1:9200/twitter/_mtermvectors?pretty" -H "Content-Type:application/json" -d'
{
"ids" : ["1", "2"],
"parameters": {
"fields": [
"text"
],
"term_statistics": true
}
}'
返回值为:
{
"docs" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"found" : true,
"took" : 0,
"term_vectors" : {
"text" : {
"field_statistics" : {
"sum_doc_freq" : 6,
"doc_count" : 2,
"sum_ttf" : 8
},
"terms" : {
"test" : {
"doc_freq" : 2,
"ttf" : 4,
"term_freq" : 3,
"tokens" : [
{
"position" : 1,
"start_offset" : 8,
"end_offset" : 12,
"payload" : "d29yZA=="
},
{
"position" : 2,
"start_offset" : 13,
"end_offset" : 17,
"payload" : "d29yZA=="
},
{
"position" : 3,
"start_offset" : 18,
"end_offset" : 22,
"payload" : "d29yZA=="
}
]
},
"twitter" : {
"doc_freq" : 2,
"ttf" : 2,
"term_freq" : 1,
"tokens" : [
{
"position" : 0,
"start_offset" : 0,
"end_offset" : 7,
"payload" : "d29yZA=="
}
]
}
}
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"found" : true,
"took" : 0,
"term_vectors" : {
"text" : {
"field_statistics" : {
"sum_doc_freq" : 6,
"doc_count" : 2,
"sum_ttf" : 8
},
"terms" : {
"..." : {
"doc_freq" : 1,
"ttf" : 1,
"term_freq" : 1,
"tokens" : [
{
"position" : 3,
"start_offset" : 21,
"end_offset" : 24,
"payload" : "d29yZA=="
}
]
},
"another" : {
"doc_freq" : 1,
"ttf" : 1,
"term_freq" : 1,
"tokens" : [
{
"position" : 0,
"start_offset" : 0,
"end_offset" : 7,
"payload" : "d29yZA=="
}
]
},
"test" : {
"doc_freq" : 2,
"ttf" : 4,
"term_freq" : 1,
"tokens" : [
{
"position" : 2,
"start_offset" : 16,
"end_offset" : 20,
"payload" : "d29yZA=="
}
]
},
"twitter" : {
"doc_freq" : 2,
"ttf" : 2,
"term_freq" : 1,
"tokens" : [
{
"position" : 1,
"start_offset" : 8,
"end_offset" : 15,
"payload" : "d29yZA=="
}
]
}
}
}
}
}
]
}
此外,就像termvectors API一样,可以为用户提供的文档生成termvectors,使用的映射由_index确定,如:
curl -XPOST "http://127.0.0.1:9200/_mtermvectors?pretty" -H "Content-Type:application/json" -d'
{
"docs": [
{
"_index": "twitter",
"doc" : {
"text" : "John Doe",
"message" : "twitter test test test"
}
},
{
"_index": "twitter",
"doc" : {
"text" : "Jane Doe",
"message" : "Another twitter test ..."
}
}
]
}'
返回值为:
{
"docs" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_version" : 0,
"found" : true,
"took" : 0,
"term_vectors" : {
"text" : {
"field_statistics" : {
"sum_doc_freq" : 6,
"doc_count" : 2,
"sum_ttf" : 8
},
"terms" : {
"doe" : {
"term_freq" : 1,
"tokens" : [
{
"position" : 1,
"start_offset" : 5,
"end_offset" : 8
}
]
},
"john" : {
"term_freq" : 1,
"tokens" : [
{
"position" : 0,
"start_offset" : 0,
"end_offset" : 4
}
]
}
}
}
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_version" : 0,
"found" : true,
"took" : 0,
"term_vectors" : {
"text" : {
"field_statistics" : {
"sum_doc_freq" : 6,
"doc_count" : 2,
"sum_ttf" : 8
},
"terms" : {
"doe" : {
"term_freq" : 1,
"tokens" : [
{
"position" : 1,
"start_offset" : 5,
"end_offset" : 8
}
]
},
"jane" : {
"term_freq" : 1,
"tokens" : [
{
"position" : 0,
"start_offset" : 0,
"end_offset" : 4
}
]
}
}
}
}
}
]
}
本文为博主原创文章,未经博主允许不得转载。
更多内容请访问:IT源点
注意:本文归作者所有,未经作者允许,不得转载