UDB-SX搜索数据

在 UDB-SX 中，有多种搜索数据的方式：

查询领域特定语言（DSL）： UDB-SX 的核心查询语言，可用于创建复杂且完全可自定义的查询。
查询字符串查询语言： 精简版查询语言，可用于搜索请求的查询参数中或 UDB-SX 仪表盘内。
SQL： 传统查询语言，衔接传统关系型数据库概念与 UDB-SX 面向文档数据存储的灵活性。
管道处理语言（PPL）： UDB-SX 中可观测性场景的核心语言，采用管道语法将命令串联成查询。
仪表盘查询语言（DQL）： UDB-SX 仪表盘中用于过滤数据的简单文本型查询语言。

本教程简要介绍如何使用查询字符串查询和查询 DSL 进行搜索。

准备数据

本教程需要先索引学生数据（若尚未操作）。可先删除 students 索引（执行 DELETE /students），再发送以下批量请求：

POST _bulk
{ "create": { "_index": "students", "_id": "1" } }
{ "name": "John Doe", "gpa": 3.89, "grad_year": 2022}
{ "create": { "_index": "students", "_id": "2" } }
{ "name": "Jonathan Powers", "gpa": 3.85, "grad_year": 2025 }
{ "create": { "_index": "students", "_id": "3" } }
{ "name": "Jane Doe", "gpa": 3.52, "grad_year": 2024 }

检索索引中的所有文档

要检索索引中的所有文档，发送以下请求：

GET /students/_search

上述请求等同于 match_all 查询，可匹配索引中的所有文档：

GET /students/_search
{
  "query": {
    "match_all": {}
  }
}

UDB-SX 会返回匹配的文档：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "students",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "John Doe",
          "gpa": 3.89,
          "grad_year": 2022
        }
      },
      {
        "_index": "students",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "Jonathan Powers",
          "gpa": 3.85,
          "grad_year": 2025
        }
      },
      {
        "_index": "students",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "Jane Doe",
          "gpa": 3.52,
          "grad_year": 2024
        }
      }
    ]
  }
}

响应体字段

上述响应包含以下字段：

took

took 字段表示查询执行耗时，单位为毫秒。

timed_out

该字段指示请求是否超时。若请求超时，UDB-SX 会返回超时前收集到的结果。你可通过 timeout 查询参数设置期望的超时时间：

GET /students/_search?timeout=20ms

_shards

_shards 对象指定了查询执行涉及的分片总数，以及成功、失败的分片数量。若某个分片及其所有副本均不可用，该分片会判定为失败。即使部分分片失败，UDB-SX 仍会在剩余分片上继续执行查询。

hits

hits 对象包含匹配文档的总数以及文档本身（列在 hits 数组中）。每个匹配文档包含 _index（索引名）、_id（文档ID）字段，以及 _source 字段——该字段存储完整的原始索引文档。

每个文档会在 _score 字段中获得一个相关性得分。由于你执行的是 match_all 搜索，所有文档的得分均为 1（相关性无差异）。max_score 字段表示所有匹配文档中的最高得分。

查询字符串查询

查询字符串查询轻量且功能强大，可通过 q 查询参数发送。例如，以下查询用于搜索姓名为 john 的学生：

GET /students/_search?q=name:john

UDB-SX 会返回匹配的文档：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.9808291,
    "hits": [
      {
        "_index": "students",
        "_id": "1",
        "_score": 0.9808291,
        "_source": {
          "name": "John Doe",
          "gpa": 3.89,
          "grad_year": 2022
        }
      }
    ]
  }
}

查询 DSL

使用查询 DSL，你可以创建更复杂、更个性化的查询。

全文搜索

你可以对映射为 text 类型的字段执行全文搜索。默认情况下，文本字段会由 default 分析器处理——该分析器会将文本拆分为词条并转为小写。

例如，以下查询用于搜索姓名为 john 的学生：

GET /students/_search
{
  "query": {
    "match": {
      "name": "john"
    }
  }
}

响应会返回匹配的文档：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.9808291,
    "hits": [
      {
        "_index": "students",
        "_id": "1",
        "_score": 0.9808291,
        "_source": {
          "name": "John Doe",
          "gpa": 3.89,
          "grad_year": 2022
        }
      }
    ]
  }
}

注意，查询文本为小写，而字段中的文本并非小写，但查询仍能返回匹配文档。

你可以调整搜索字符串中词条的顺序。例如，以下查询搜索 doe john：

GET /students/_search
{
  "query": {
    "match": {
      "name": "doe john"
    }
  }
}

响应会返回两个匹配文档：

{
  "took": 45,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1.4508327,
    "hits": [
      {
        "_index": "students",
        "_id": "1",
        "_score": 1.4508327,
        "_source": {
          "name": "John Doe",
          "gpa": 3.89,
          "grad_year": 2022
        }
      },
      {
        "_index": "students",
        "_id": "3",
        "_score": 0.4700036,
        "_source": {
          "name": "Jane Doe",
          "gpa": 3.52,
          "grad_year": 2024
        }
      }
    ]
  }
}

match 查询类型默认使用 OR 作为运算符，因此该查询实际等效于 doe OR john。John Doe 和 Jane Doe 均匹配了 doe 一词，但 John Doe 因同时匹配 john 而获得更高得分。

关键词搜索

name 字段包含 UDB-SX 自动添加的 name.keyword 子字段。若你以类似之前的方式搜索 name.keyword 字段：

GET /students/_search
{
  "query": {
    "match": {
      "name.keyword": "john"
    }
  }
}

请求会返回无匹配结果，因为 keyword 字段要求精确匹配。

但如果搜索精确文本 John Doe：

GET /students/_search
{
  "query": {
    "match": {
      "name.keyword": "John Doe"
    }
  }
}

UDB-SX 会返回匹配的文档：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.9808291,
    "hits": [
      {
        "_index": "students",
        "_id": "1",
        "_score": 0.9808291,
        "_source": {
          "name": "John Doe",
          "gpa": 3.89,
          "grad_year": 2022
        }
      }
    ]
  }
}

过滤器

使用布尔查询，你可以为具有精确值的字段添加过滤子句。

词条过滤器匹配特定词条。例如，以下布尔查询搜索毕业年份为 2022 年的学生：

GET students/_search
{
  "query": { 
    "bool": { 
      "filter": [ 
        { "term":  { "grad_year": 2022 }}
      ]
    }
  }
}

通过范围过滤器，你可以指定值的范围。例如，以下布尔查询搜索 GPA 大于 3.6 的学生：

GET students/_search
{
  "query": { 
    "bool": { 
      "filter": [ 
        { "range": { "gpa": { "gt": 3.6 }}}
      ]
    }
  }
}

复合查询

复合查询允许你组合多个查询或过滤子句，布尔查询就是复合查询的一种示例。

例如，要搜索姓名匹配 doe 且按毕业年份和 GPA 过滤的学生，可使用以下请求：

GET students/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "doe"
          }
        },
        { "range": { "gpa": { "gte": 3.6, "lte": 3.9 } } },
        { "term":  { "grad_year": 2022 }}
      ]
    }
  }
}