对结果排序
排序允许您的用户以对他们最有意义的方式排列结果。默认情况下,全文查询按相关性评分对结果排序。您可以通过将 order 参数设置为 asc 或 desc,选择按任何字段值升序或降序对结果排序。
例如,要按 line_id 值的降序对结果排序,请使用以下查询:
GET shakespeare/_search
{
"query": {
"term": {
"play_name": {
"value": "Henry IV"
}
}
},
"sort": [
{
"line_id": {
"order": "desc"
}
}
]
}
结果按 line_id 降序排序:
{
"took" : 24,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3205,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "shakespeare",
"_id" : "3204",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3205,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "",
"speaker" : "KING HENRY IV",
"text_entry" : "Exeunt"
},
"sort" : [
3205
]
},
{
"_index" : "shakespeare",
"_id" : "3203",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3204,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.45",
"speaker" : "KING HENRY IV",
"text_entry" : "Let us not leave till all our own be won."
},
"sort" : [
3204
]
},
{
"_index" : "shakespeare",
"_id" : "3202",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3203,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.44",
"speaker" : "KING HENRY IV",
"text_entry" : "And since this business so fair is done,"
},
"sort" : [
3203
]
},
{
"_index" : "shakespeare",
"_id" : "3201",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3202,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.43",
"speaker" : "KING HENRY IV",
"text_entry" : "Meeting the cheque of such another day:"
},
"sort" : [
3202
]
},
{
"_index" : "shakespeare",
"_id" : "3200",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3201,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.42",
"speaker" : "KING HENRY IV",
"text_entry" : "Rebellion in this land shall lose his sway,"
},
"sort" : [
3201
]
},
{
"_index" : "shakespeare",
"_id" : "3199",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3200,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.41",
"speaker" : "KING HENRY IV",
"text_entry" : "To fight with Glendower and the Earl of March."
},
"sort" : [
3200
]
},
{
"_index" : "shakespeare",
"_id" : "3198",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3199,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.40",
"speaker" : "KING HENRY IV",
"text_entry" : "Myself and you, son Harry, will towards Wales,"
},
"sort" : [
3199
]
},
{
"_index" : "shakespeare",
"_id" : "3197",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3198,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.39",
"speaker" : "KING HENRY IV",
"text_entry" : "Who, as we hear, are busily in arms:"
},
"sort" : [
3198
]
},
{
"_index" : "shakespeare",
"_id" : "3196",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3197,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.38",
"speaker" : "KING HENRY IV",
"text_entry" : "To meet Northumberland and the prelate Scroop,"
},
"sort" : [
3197
]
},
{
"_index" : "shakespeare",
"_id" : "3195",
"_score" : null,
"_source" : {
"type" : "line",
"line_id" : 3196,
"play_name" : "Henry IV",
"speech_number" : 8,
"line_number" : "5.5.37",
"speaker" : "KING HENRY IV",
"text_entry" : "Towards York shall bend you with your dearest speed,"
},
"sort" : [
3196
]
}
]
}
}
sort 参数是一个数组,因此您可以按优先级顺序指定多个字段值。
如果有两个 line_id 值相同的文档,UDB-SX 将使用 speech_number 作为第二个排序选项:
GET shakespeare/_search
{
"query": {
"term": {
"play_name": {
"value": "Henry IV"
}
}
},
"sort": [
{
"line_id": {
"order": "desc"
}
},
{
"speech_number": {
"order": "desc"
}
}
]
}
您可以继续按任意数量的字段值排序,以获得顺序正确的结果。排序值不一定是数值——您也可以按日期或时间戳字段排序:
"sort": [
{
"date": {
"order": "desc"
}
}
]
经过分析(analyzed)的文本字段不能用于对文档排序,因为倒排索引只包含单独的标记化词汇,而不是完整的字符串。因此,例如,您不能按 play_name 排序。
要绕过此限制,您可以使用映射为关键字(keyword)类型的文本字段的原始版本。在以下示例中,play_name.keyword 未经分析,您拥有用于排序的完整原始版本副本:
GET shakespeare/_search
{
"query": {
"term": {
"play_name": {
"value": "Henry IV"
}
}
},
"sort": [
{
"play_name.keyword": {
"order": "desc"
}
}
]
}
结果按 play_name 字段字母顺序排序。
将 sort 与 search_after 参数结合使用,可以实现更高效的滚动浏览。结果从您在 search_after 数组中指定的排序值之后的文档开始。
确保 search_after 数组中的值与 sort 数组中的数量相同,并且顺序也相同。在这种情况下,您请求的是 line_id = 3202 和 speech_number = 8 之后文档的结果:
GET shakespeare/_search
{
"query": {
"term": {
"play_name": {
"value": "Henry IV"
}
}
},
"sort": [
{
"line_id": {
"order": "desc"
}
},
{
"speech_number": {
"order": "desc"
}
}
],
"search_after": [
"3202",
"8"
]
}
排序模式
排序模式适用于按数组或多值字段排序。它指定应选择哪个数组值来对文档排序。对于包含数字数组的数值字段,您可以按 avg、sum 或 median 模式排序。要按最小值或最大值排序,请使用 min 或 max 模式,这两种模式适用于数值和字符串数据类型。
默认模式对于升序排序是 min,对于降序排序是 max。
以下示例说明如何使用排序模式按数组字段排序。
考虑一个存储学生成绩的索引。向索引中索引两个文档:
PUT students/_doc/1
{
"name": "John Doe",
"grades": [70, 90]
}
PUT students/_doc/2
{
"name": "Mary Major",
"grades": [80, 100]
}
使用 avg 模式按最高平均成绩对所有学生排序:
GET students/_search
{
"query" : {
"match_all": {}
},
"sort" : [
{"grades" : {"order" : "desc", "mode" : "avg"}}
]
}
响应包含按 grades 降序排序的学生:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "students",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "Mary Major",
"grades" : [
80,
100
]
},
"sort" : [
90
]
},
{
"_index" : "students",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "John Doe",
"grades" : [
70,
90
]
},
"sort" : [
80
]
}
]
}
}
对嵌套对象排序
对嵌套对象排序时,提供 path 参数,指定要排序的字段路径。
例如,在索引 students 中,将变量 first_sem 映射为 nested:
PUT students
{
"mappings" : {
"properties": {
"first_sem": {
"type" : "nested"
}
}
}
}
索引两个包含嵌套字段的文档:
PUT students/_doc/1
{
"name": "John Doe",
"first_sem" : {
"grades": [70, 90]
}
}
PUT students/_doc/2
{
"name": "Mary Major",
"first_sem": {
"grades": [80, 100]
}
}
按平均成绩排序时,提供嵌套字段的路径:
GET students/_search
{
"query" : {
"match_all": {}
},
"sort" : [
{"first_sem.grades": {
"order" : "desc",
"mode" : "avg",
"nested": {
"path": "first_sem"
}
}
}
]
}
处理缺失值
missing 参数指定如何处理缺失值。内置的有效值是 _last(将缺失值的文档列在最后)和 _first(将缺失值的文档列在最前)。默认值为 _last。您还可以指定一个自定义值,用作缺失文档的排序值。
例如,您可以索引一个包含 average 字段的文档和另一个不包含 average 字段的文档:
PUT students/_doc/1
{
"name": "John Doe",
"average": 80
}
PUT students/_doc/2
{
"name": "Mary Major"
}
对文档排序,将缺失字段的文档排在最前面:
GET students/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"average": {
"order": "desc",
"missing": "_first"
}
}
]
}
响应将文档 2 列在最前面:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "students",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "Mary Major"
},
"sort" : [
9223372036854775807
]
},
{
"_index" : "students",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "John Doe",
"average" : 80
},
"sort" : [
80
]
}
]
}
}
忽略未映射的字段
如果字段未映射,默认情况下,按此字段排序的搜索请求会失败。为避免这种情况,您可以使用 unmapped_type 参数,该参数指示 UDB-SX 忽略该字段。例如,如果将 unmapped_type 设置为 long,则该字段将被视为已映射为 long 类型。此外,索引中所有具有 unmapped_type 字段的文档都被视为在此字段中没有值,因此不会按此字段排序。
例如,考虑两个索引。在第一个索引中索引一个包含 average 字段的文档:
PUT students/_doc/1
{
"name": "John Doe",
"average": 80
}
在第二个索引中索引一个不包含 average 字段的文档:
PUT students_no_map/_doc/2
{
"name": "Mary Major"
}
搜索两个索引中的所有文档,并按 average 字段排序:
GET students*/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"average": {
"order": "desc"
}
}
]
}
默认情况下,第二个索引会产生错误,因为 average 字段未映射:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 1,
"skipped" : 0,
"failed" : 1,
"failures" : [
{
"shard" : 0,
"index" : "students_no_map",
"node" : "cam9NWqVSV-jUIkQ3tRubw",
"reason" : {
"type" : "query_shard_exception",
"reason" : "No mapping found for [average] in order to sort on",
"index" : "students_no_map",
"index_uuid" : "JgfRkypKSUSpyU-ZXr9kKA"
}
}
]
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "students",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "John Doe",
"average" : 80
},
"sort" : [
80
]
}
]
}
}
您可以指定 unmapped_type 参数,以便忽略未映射的字段:
GET students*/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"average": {
"order": "desc",
"unmapped_type": "long"
}
}
]
}
响应包含两个文档:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "students",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "John Doe",
"average" : 80
},
"sort" : [
80
]
},
{
"_index" : "students_no_map",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "Mary Major"
},
"sort" : [
-9223372036854775808
]
}
]
}
}
跟踪评分
默认情况下,对字段排序时不计算评分。您可以将 track_scores 设置为 true 来计算和跟踪评分:
GET students/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"average": {
"order": "desc"
}
}
],
"track_scores": true
}
按地理距离排序
您可以按 _geo_distance 对文档排序。支持以下参数。
| 参数 | 描述 |
|---|---|
| distance_type | 指定计算距离的方法。有效值为 arc 和 plane。plane 方法更快,但对于长距离或靠近两极时精度较低。默认为 arc。 |
| mode | 指定如何处理具有多个地理点的字段。默认情况下,当排序顺序为升序时,按最短距离排序;当排序顺序为降序时,按最长距离排序。有效值为 min、max、median 和 avg。 |
| unit | 指定用于计算排序值的单位。默认为米 (m)。 |
| ignore_unmapped | 指定如何处理未映射的字段。将 ignore_unmapped 设置为 true 以忽略未映射的字段。默认为 false(遇到未映射字段时产生错误)。 |
_geo_distance 参数不支持 missing_values。当文档不包含用于计算距离的字段时,距离始终被视为 infinity。
例如,索引两个包含地理点的文档:
PUT testindex1/_doc/1
{
"point": [74.00, 40.71]
}
PUT testindex1/_doc/2
{
"point": [73.77, -69.63]
}
搜索所有文档,并按它们与给定点的距离排序:
GET testindex1/_search
{
"sort": [
{
"_geo_distance": {
"point": [59, -54],
"order": "asc",
"unit": "km",
"distance_type": "arc",
"mode": "min",
"ignore_unmapped": true
}
}
],
"query": {
"match_all": {}
}
}
响应包含排序后的文档:
{
"took" : 864,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "testindex1",
"_id" : "2",
"_score" : null,
"_source" : {
"point" : [
73.77,
-69.63
]
},
"sort" : [
1891.2667493895767
]
},
{
"_index" : "testindex1",
"_id" : "1",
"_score" : null,
"_source" : {
"point" : [
74.0,
40.71
]
},
"sort" : [
10628.402240213345
]
}
]
}
}
您可以使用地理点字段类型支持的任何格式提供坐标。有关所有格式的描述,请参阅地理点字段类型文档。
要将多个地理点传递给 _geo_distance,请使用数组:
GET testindex1/_search
{
"sort": [
{
"_geo_distance": {
"point": [[59, -54], [60, -53]],
"order": "asc",
"unit": "km",
"distance_type": "arc",
"mode": "min",
"ignore_unmapped": true
}
}
],
"query": {
"match_all": {}
}
}
对于每个文档,排序距离计算为搜索中提供的所有点到文档中所有点的距离的最小值、最大值或平均值(由 mode 指定)。
性能注意事项
排序的字段值会被加载到内存中进行排序。因此,为了最小化开销,我们建议将数值类型映射到可接受的最小类型,如 short、integer 和 float。字符串类型的排序字段不应被分析(analyzed)或标记化(tokenized)。