检索特定字段

在 UDB-SX 中运行基本搜索时,默认情况下,用于索引的原始 JSON 对象也会在响应的每个命中文档的 _source 对象中返回。这可能导致通过网络传输大量数据,增加延迟和成本。有几种方法可以限制响应只包含所需信息。

禁用 _source

您可以在搜索请求中将 _source 设置为 false 以从响应中排除 _source 字段:

GET /index1/_search
{
    "_source": false,
    "query": {
        "match_all": {}
  }
}

由于前面的搜索中没有选择任何字段,检索到的命中结果将仅包含命中文档的 _index_id_score

{
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "index1",
        "_id" : "41",
        "_score" : 1.0
      },
      {
        "_index" : "index1",
        "_id" : "51",
        "_score" : 1.0
      }
    ]
  }
}

_source 也可以在索引映射中使用以下配置来禁用:

"mappings": {
  "_source": {
    "enabled": false
  }
}

如果在索引映射中禁用了 _source使用 docvalue 字段搜索使用存储字段搜索 将变得非常有用。

指定要检索的字段

您可以在 fields 参数中列出要检索的字段。也接受通配符模式:

GET /index1/_search
{
    "_source": false,
    "fields": ["age", "nam*"],
    "query": {
        "match_all": {}
  }
}

响应包含 nameage 字段:

{
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "index1",
        "_id" : "41",
        "_score" : 1.0,
        "fields" : {
          "name" : [
            "John Doe"
          ],
          "age" : [
            30
          ]
        }
      },
      {
        "_index" : "index1",
        "_id" : "51",
        "_score" : 1.0,
        "fields" : {
          "name" : [
            "Jane Smith"
          ],
          "age" : [
            25
          ]
        }
      }
    ]
  }
}

使用自定义格式提取字段

您也可以使用对象表示法对所选字段应用自定义格式。

如果您有如下文档:

{
  "_index": "my_index",
  "_type": "_doc",
  "_id": "1",
  "_source": {
    "title": "Document 1",
    "date": "2023-07-04T12:34:56Z"
  }
}

那么您可以使用 fields 参数和自定义格式进行查询:

GET /my_index/_search
{
  "query": {
    "match_all": {}
  },
  "fields": {
    "date": {
      "format": "yyyy-MM-dd"
    }
  },
  "_source": false
}

此外,您可以在 fields 参数中使用多数字段字段别名,因为它同时查询文档的 _source 和索引的 _mappings

使用 docvalue_fields 搜索

要从索引中检索特定字段,您还可以使用 docvalue_fields 参数。该参数的工作方式与 fields 参数略有不同。它从文档值(doc values)而非 _source 字段中检索信息,这对于非分析字段(如关键字、日期和数值字段)效率更高。文档值采用列式存储格式,针对高效排序和聚合进行了优化。它以易于读取的方式将值存储在磁盘上。当您使用 docvalue_fields 时,UDB-SX 会直接从这种优化的存储格式中读取值。这对于检索主要用于排序、聚合和脚本中的字段值非常有用。

以下示例演示了如何使用 docvalue_fields 参数。

  1. 创建具有以下映射的索引:

    PUT /my_index
    {
      "mappings": {
        "properties": {
          "title": { "type": "text" },
          "author": { "type": "keyword" },
          "publication_date": { "type": "date" },
          "price": { "type": "double" }
        }
      }
    }
    
  2. 将以下文档索引到新创建的索引中:

    POST /my_index/_doc/1
    {
      "title": "UDB-SX Basics",
      "author": "John Doe",
      "publication_date": "2021-01-01",
      "price": 29.99
    }
    
    POST /my_index/_doc/2
    {
      "title": "Advanced UDB-SX",
      "author": "Jane Smith",
      "publication_date": "2022-01-01",
      "price": 39.99
    }
    
  3. 使用 docvalue_fields 仅检索 authorpublication_date 字段:

    POST /my_index/_search
    {
      "_source": false,
      "docvalue_fields": ["author", "publication_date"],
      "query": {
        "match_all": {}
      }
    }
    

响应包含 authorpublication_date 字段:

{
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1.0,
        "fields": {
          "author": ["John Doe"],
          "publication_date": ["2021-01-01T00:00:00.000Z"]
        }
      },
      {
        "_index": "my_index",
        "_id": "2",
        "_score": 1.0,
        "fields": {
          "author": ["Jane Smith"],
          "publication_date": ["2022-01-01T00:00:00.000Z"]
        }
      }
    ]
  }
}

在嵌套对象中使用 docvalue_fields

在 UDB-SX 中,如果您想检索嵌套对象的文档值,则不能直接使用 docvalue_fields 参数,因为它将返回空数组。相反,您应该使用 inner_hits 参数及其自身的 docvalue_fields 属性,如下例所示。

  1. 定义索引映射:

    PUT /my_index
    {
      "mappings": {
        "properties": {
          "title": { "type": "text" },
          "author": { "type": "keyword" },
          "comments": {
            "type": "nested",
            "properties": {
              "username": { "type": "keyword" },
              "content": { "type": "text" },
              "created_at": { "type": "date" }
            }
          }
        }
      }
    }
    
  2. 索引数据:

    POST /my_index/_doc/1
    {
      "title": "UDB-SX Basics",
      "author": "John Doe",
      "comments": [
        {
          "username": "alice",
          "content": "Great article!",
          "created_at": "2023-01-01T12:00:00Z"
        },
        {
          "username": "bob",
          "content": "Very informative.",
          "created_at": "2023-01-02T12:00:00Z"
        }
      ]
    }
    
  3. 使用 inner_hitsdocvalue_fields 执行搜索:

    POST /my_index/_search
    {
      "query": {
        "nested": {
          "path": "comments",
          "query": {
            "match_all": {}
          },
          "inner_hits": {
            "docvalue_fields": ["username", "created_at"]
          }
        }
      }
    }
    

预期响应如下:

{
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "title": "UDB-SX Basics",
          "author": "John Doe",
          "comments": [
            {
              "username": "alice",
              "content": "Great article!",
              "created_at": "2023-01-01T12:00:00Z"
            },
            {
              "username": "bob",
              "content": "Very informative.",
              "created_at": "2023-01-02T12:00:00Z"
            }
          ]
        },
        "inner_hits": {
          "comments": {
            "hits": {
              "total": {
                "value": 2,
                "relation" : "eq"
              },
              "max_score": 1.0,
              "hits": [
                {
                  "_index": "my_index",
                  "_id": "1",
                  "_nested": {
                    "field": "comments",
                    "offset": 0
                  },
                  "docvalue_fields": {
                    "username": ["alice"],
                    "created_at": ["2023-01-01T12:00:00Z"]
                  }
                },
                {
                  "_index": "my_index",
                  "_id": "1",
                  "_nested": {
                    "field": "comments",
                    "offset": 1
                  },
                  "docvalue_fields": {
                    "username": ["bob"],
                    "created_at": ["2023-01-02T12:00:00Z"]
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

使用 stored_fields 搜索

默认情况下,UDB-SX 将整个文档存储在 _source 字段中,并使用它返回搜索结果中的文档内容。但是,您可能还想单独存储某些字段以实现更高效的检索。您可以使用 stored_fields 显式存储和检索特定的文档字段,独立于 _source 字段。

_source 不同,stored_fields 必须在映射中为您希望单独存储的字段显式定义。如果您经常只需要检索一小部分字段,并希望避免检索整个 _source 字段,这将非常有用。以下示例演示了如何使用 stored_fields 参数。

  1. 创建具有以下映射的索引:

    PUT /my_index
    {
      "mappings": {
        "properties": {
          "title": {
            "type": "text",
            "store": true  // 单独存储 title 字段
          },
          "author": {
            "type": "keyword",
            "store": true  // 单独存储 author 字段
          },
          "publication_date": {
            "type": "date"
          },
          "price": {
            "type": "double"
          }
        }
      }
    }
    
  2. 索引数据:

    POST /my_index/_doc/1
    {
      "title": "UDB-SX Basics",
      "author": "John Doe",
      "publication_date": "2022-01-01",
      "price": 29.99
    }
    
    POST my_index/_doc/2
    {
      "title": "Advanced UDB-SX",
      "author": "Jane Smith",
      "publication_date": "2023-01-01",
      "price": 39.99
    }
    
  3. 使用 stored_fields 执行搜索:

    POST /my_index/_search
    {
      "_source": false,
      "stored_fields": ["title", "author"],
      "query": {
        "match_all": {}
      }
    }
    

预期响应如下:

{
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1.0,
        "fields": {
          "title": ["UDB-SX Basics"],
          "author": ["John Doe"]
        }
      },
      {
        "_index": "my_index",
        "_id": "2",
        "_score": 1.0,
        "fields": {
          "title": ["Advanced UDB-SX"],
          "author": ["Jane Smith"]
        }
      }
    ]
  }
}

通过将 stored_fields 设置为 _none_,可以完全禁用 stored_fields 参数。

在嵌套对象中搜索 stored_fields

在 UDB-SX 中,如果您想检索嵌套对象的 stored_fields,则不能直接使用 stored_fields 参数,因为不会返回任何数据。相反,您应该使用 inner_hits 参数及其自身的 stored_fields 属性,如下例所示。

  1. 创建具有以下映射的索引:

    PUT /my_index
    {
      "mappings": {
        "properties": {
          "title": { "type": "text" },
          "author": { "type": "keyword" },
          "comments": {
            "type": "nested",
            "properties": {
              "username": { "type": "keyword", "store": true },
              "content": { "type": "text", "store": true },
              "created_at": { "type": "date", "store": true }
            }
          }
        }
      }
    }
    
  2. 索引数据:

    POST /my_index/_doc/1
    {
      "title": "UDB-SX Basics",
      "author": "John Doe",
      "comments": [
        {
          "username": "alice",
          "content": "Great article!",
          "created_at": "2023-01-01T12:00:00Z"
        },
        {
          "username": "bob",
          "content": "Very informative.",
          "created_at": "2023-01-02T12:00:00Z"
        }
      ]
    }
    
  3. 使用 inner_hitsstored_fields 执行搜索:

    POST /my_index/_search
    {
      "_source": false,
      "query": {
        "nested": {
          "path": "comments",
          "query": {
            "match_all": {}
          },
          "inner_hits": {
            "stored_fields": ["comments.username", "comments.content", "comments.created_at"]
          }
        }
      }
    }
    

预期响应如下:

{
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1.0,
        "inner_hits": {
          "comments": {
            "hits": {
              "total": {
                "value": 2,
                "relation": "eq"
              },
              "max_score": 1.0,
              "hits": [
                {
                  "_index": "my_index",
                  "_id": "1",
                  "_nested": {
                    "field": "comments",
                    "offset": 0
                  },
                  "fields": {
                    "comments.username": ["alice"],
                    "comments.content": ["Great article!"],
                    "comments.created_at": ["2023-01-01T12:00:00.000Z"]
                  }
                },
                {
                  "_index": "my_index",
                  "_id": "1",
                  "_nested": {
                    "field": "comments",
                    "offset": 1
                  },
                  "fields": {
                    "comments.username": ["bob"],
                    "comments.content": ["Very informative."],
                    "comments.created_at": ["2023-01-02T12:00:00.000Z"]
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

使用源过滤

源过滤是一种控制搜索响应中包含 _source 字段哪些部分的方法。仅包含响应中必要的字段有助于减少通过网络传输的数据量并提高性能。

您可以使用完整字段名或简单的通配符模式,在搜索响应中包含或排除 _source 字段中的特定字段。以下示例演示了如何包含特定字段。

  1. 索引数据:

    PUT /my_index/_doc/1
    {
      "title": "UDB-SX Basics",
      "author": "John Doe",
      "publication_date": "2021-01-01",
      "price": 29.99
    }
    
  2. 使用源过滤执行搜索:

    POST /my_index/_search
    {
      "_source": ["title", "author"],
      "query": {
        "match_all": {}
      }
    }
    

预期响应如下:

{
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "title": "UDB-SX Basics",
          "author": "John Doe"
        }
      }
    ]
  }
}

使用源过滤排除字段

您可以通过在搜索请求中使用 "excludes" 参数来选择排除字段,如下例所示:

POST /my_index/_search
{
  "_source": {
    "excludes": ["price"]
  },
  "query": {
    "match_all": {}
  }
}

预期响应如下:

{
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "title": "UDB-SX Basics",
          "author": "John Doe",
          "publication_date": "2021-01-01"
        }
      }
    ]
  }
}

在同一搜索中包含和排除字段

在某些情况下,可能同时需要 includeexclude 参数。以下示例演示了如何在同一次搜索中包含和排除字段。

考虑一个包含以下文档的 products 索引:

{
  "product_id": "123",
  "name": "Smartphone",
  "category": "Electronics",
  "price": 699.99,
  "description": "A powerful smartphone with a sleek design.",
  "reviews": [
    {
      "user": "john_doe",
      "rating": 5,
      "comment": "Great phone!",
      "date": "2023-01-01"
    },
    {
      "user": "jane_doe",
      "rating": 4,
      "comment": "Good value for money.",
      "date": "2023-02-15"
    }
  ],
  "supplier": {
    "name": "TechCorp",
    "contact_email": "support@techcorp.com",
    "address": {
      "street": "123 Tech St",
      "city": "Techville",
      "zipcode": "12345"
    }
  },
  "inventory": {
    "stock": 50,
    "warehouse_location": "A1"
  }
}

要在此索引上执行搜索,同时在响应中仅包含 namepricereviewssupplier 字段,并排除 supplier 对象中的 contact_email 字段和 reviews 对象中的 comment 字段,请执行以下搜索:

GET /products/_search
{
  "_source": {
    "includes": ["name", "price", "reviews.*", "supplier.*"],
    "excludes": ["reviews.comment", "supplier.contact_email"]
  },
  "query": {
    "match": {
      "category": "Electronics"
    }
  }
}

预期响应如下:

{
  "hits": {
    "hits": [
      {
        "_source": {
          "name": "Smartphone",
          "price": 699.99,
          "reviews": [
            {
              "user": "john_doe",
              "rating": 5,
              "date": "2023-01-01"
            },
            {
              "user": "jane_doe",
              "rating": 4,
              "date": "2023-02-15"
            }
          ],
          "supplier": {
            "name": "TechCorp",
            "address": {
              "street": "123 Tech St",
              "city": "Techville",
              "zipcode": "12345"
            }
          }
        }
      }
    ]
  }
}

使用脚本字段

script_fields 参数允许您包含自定义字段,其值使用脚本在搜索结果中计算。这对于根据文档数据动态计算值非常有用。您也可以通过类似的方法检索 derived fields。有关更多信息,请参阅检索字段

如果您有一个产品索引,其中每个产品文档都包含 pricediscount_percentage 字段。您可以使用 script_fields 参数在搜索结果中包含一个自定义字段,显示每个产品的折扣价。以下示例演示了如何使用 script_fields 参数:

  1. 索引数据:

    PUT /products/_doc/123
    {
      "product_id": "123",
      "name": "Smartphone",
      "price": 699.99,
      "discount_percentage": 10,
      "category": "Electronics",
      "description": "A powerful smartphone with a sleek design."
    }
    
  2. 使用 script_fields 参数在搜索结果中包含一个名为 discounted_price 的自定义字段。该字段将基于 pricediscount_percentage 字段使用脚本计算:

    GET /products/_search
    {
      "_source": ["product_id", "name", "price", "discount_percentage"],
      "query": {
        "match": {
          "category": "Electronics"
        }
      },
      "script_fields": {
        "discounted_price": {
          "script": {
            "lang": "painless",
            "source": "doc[\"price\"].value * (1 - doc[\"discount_percentage\"].value / 100)"
          }
        }
      }
    }
    

您应该收到以下响应:

{
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "products",
        "_id": "123",
        "_score": 1.0,
        "_source": {
          "product_id": "123",
          "name": "Smartphone",
          "price": 699.99,
          "discount_percentage": 10
        },
        "fields": {
          "discounted_price": [629.991]
        }
      }
    ]
  }
}