截断命中结果处理器
truncate_hits 响应处理器在达到指定的命中数量后,会丢弃后续返回的搜索结果命中项。truncate_hits 处理器设计用于与 oversample 请求处理器 协同工作,但也可以单独使用。
target_size 参数(指定从何处开始截断)是可选的。如果未指定,UDB-SX 将使用由 oversample 处理器设置的 original_size 变量(如果可用)。
以下是一个常见的使用模式:
在请求管道中添加
oversample处理器,以获取更大的结果集。在响应管道中,应用一个重排序处理器(可能会提升原本请求的前 N 项之后的结果)或
collapse处理器(可能在去重后丢弃一些结果)。应用
truncate处理器,以返回(最多)原始请求数量的命中结果。
请求体字段
下表列出了所有请求字段。
| 字段 | 数据类型 | 描述 |
|---|---|---|
target_size |
整数 | 要返回的最大搜索命中数 (>=0)。如果未指定,处理器将尝试读取 original_size 变量,如果该变量不可用,则会失败。可选。 |
context_prefix |
字符串 | 可用于从特定作用域读取 original_size 变量,以避免冲突。可选。 |
tag |
字符串 | 处理器的标识符。可选。 |
description |
字符串 | 处理器的描述信息。可选。 |
ignore_failure |
布尔值 | 如果设置为 true,UDB-SX 将忽略此处理器的任何失败,并继续运行搜索管道中的其余处理器。可选。默认值为 false。 |
示例
以下示例演示了如何使用包含 truncate 处理器的搜索管道。
准备工作
创建一个包含多个文档的索引 my_index:
POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "doc": { "title" : "document 1" }}
{ "create":{"_index":"my_index","_id":2}}
{ "doc": { "title" : "document 2" }}
{ "create":{"_index":"my_index","_id":3}}
{ "doc": { "title" : "document 3" }}
{ "create":{"_index":"my_index","_id":4}}
{ "doc": { "title" : "document 4" }}
{ "create":{"_index":"my_index","_id":5}}
{ "doc": { "title" : "document 5" }}
{ "create":{"_index":"my_index","_id":6}}
{ "doc": { "title" : "document 6" }}
{ "create":{"_index":"my_index","_id":7}}
{ "doc": { "title" : "document 7" }}
{ "create":{"_index":"my_index","_id":8}}
{ "doc": { "title" : "document 8" }}
{ "create":{"_index":"my_index","_id":9}}
{ "doc": { "title" : "document 9" }}
{ "create":{"_index":"my_index","_id":10}}
{ "doc": { "title" : "document 10" }}
创建搜索管道
以下请求创建了一个名为 my_pipeline 的搜索管道,其中包含一个 truncate_hits 响应处理器,该处理器会丢弃前五个之后的命中结果:
PUT /_search/pipeline/my_pipeline
{
"response_processors": [
{
"truncate_hits" : {
"tag" : "truncate_1",
"description" : "This processor will discard results after the first 5.",
"target_size" : 5
}
}
]
}
使用搜索管道
在不使用搜索管道的情况下,搜索 my_index 中的文档:
POST /my_index/_search
{
"size": 8
}
响应包含八个命中项:
响应
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 1"
}
}
},
{
"_index" : "my_index",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 2"
}
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 3"
}
}
},
{
"_index" : "my_index",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 4"
}
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 5"
}
}
},
{
"_index" : "my_index",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 6"
}
}
},
{
"_index" : "my_index",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 7"
}
}
},
{
"_index" : "my_index",
"_id" : "8",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 8"
}
}
}
]
}
}
若要使用管道进行搜索,请在 search_pipeline 查询参数中指定管道名称:
POST /my_index/_search?search_pipeline=my_pipeline
{
"size": 8
}
尽管请求了8个命中项且总共有10个可用,但响应只包含5个命中项:
响应
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 1"
}
}
},
{
"_index" : "my_index",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 2"
}
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 3"
}
}
},
{
"_index" : "my_index,
"_id" : "4",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 4"
}
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 5"
}
}
}
]
}
}
超采样、折叠和截断命中项
以下是一个更贴近实际的示例,您将使用 oversample 请求许多候选文档,使用 collapse 去除重复特定字段的文档(以获得更多样化的结果),然后使用 truncate 返回原始请求的文档数量(以避免从集群返回巨大的结果负载)。
准备工作
创建许多包含将用于折叠的字段的文档:
POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "title" : "document 1", "color":"blue" }
{ "create":{"_index":"my_index","_id":2}}
{ "title" : "document 2", "color":"blue" }
{ "create":{"_index":"my_index","_id":3}}
{ "title" : "document 3", "color":"red" }
{ "create":{"_index":"my_index","_id":4}}
{ "title" : "document 4", "color":"red" }
{ "create":{"_index":"my_index","_id":5}}
{ "title" : "document 5", "color":"yellow" }
{ "create":{"_index":"my_index","_id":6}}
{ "title" : "document 6", "color":"yellow" }
{ "create":{"_index":"my_index","_id":7}}
{ "title" : "document 7", "color":"orange" }
{ "create":{"_index":"my_index","_id":8}}
{ "title" : "document 8", "color":"orange" }
{ "create":{"_index":"my_index","_id":9}}
{ "title" : "document 9", "color":"green" }
{ "create":{"_index":"my_index","_id":10}}
{ "title" : "document 10", "color":"green" }
创建一个仅根据 color 字段进行折叠的管道:
PUT /_search/pipeline/collapse_pipeline
{
"response_processors": [
{
"collapse" : {
"field": "color"
}
}
]
}
创建另一个执行超采样、折叠,然后截断结果的管道:
PUT /_search/pipeline/oversampling_collapse_pipeline
{
"request_processors": [
{
"oversample": {
"sample_factor": 3
}
}
],
"response_processors": [
{
"collapse" : {
"field": "color"
}
},
{
"truncate_hits": {
"description": "Truncates back to the original size before oversample increased it."
}
}
]
}
不使用超采样的折叠
在此示例中,您请求在根据 color 字段折叠之前的前三个文档。由于前两个文档具有相同的 color,第二个被丢弃,请求返回第一个和第三个文档:
POST /my_index/_search?search_pipeline=collapse_pipeline
{
"size": 3
}
响应
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "document 1",
"color" : "blue"
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "document 3",
"color" : "red"
}
}
]
},
"profile" : {
"shards" : [ ]
}
}
超采样、折叠和截断
现在,您将使用 oversampling_collapse_pipeline,它请求前9个文档(将大小乘以3),按 color 去重,然后返回前3个命中项:
POST /my_index/_search?search_pipeline=oversampling_collapse_pipeline
{
"size": 3
}
响应
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index,
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "document 1",
"color" : "blue"
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "document 3",
"color" : "red"
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"title" : "document 5",
"color" : "yellow"
}
}
]
},
"profile" : {
"shards" : [ ]
}
}