Ignore above(忽略长度超过)

ignore_above 映射参数限制了索引字符串的最大字符数。如果字符串长度超过指定的阈值,该值会随文档存储但不会被索引。这有助于防止索引因异常长的值而膨胀,并确保查询效率。

默认情况下,如果您不指定 ignore_above,所有字符串值都会被完全索引。

示例:未使用 ignore_above

创建一个带有 keyword 字段但未指定 ignore_above 参数的索引:

PUT /test-no-ignore
{
  "mappings": {
    "properties": {
      "sentence": {
        "type": "keyword"
      }
    }
  }
}

索引一个包含长字符串值的文档:

PUT /test-no-ignore/_doc/1
{
  "sentence": "text longer than 10 characters"
}

运行针对完整字符串的词项查询:

POST /test-no-ignore/_search
{
  "query": {
    "term": {
      "sentence": "text longer than 10 characters"
    }
  }
}

文档被返回,因为 sentence 字段已被索引:

{
  ...
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.13353139,
    "hits": [
      {
        "_index": "test-no-ignore",
        "_id": "1",
        "_score": 0.13353139,
        "_source": {
          "sentence": "text longer than 10 characters"
        }
      }
    ]
  }
}

示例:使用 ignore_above

在同一个字段上创建索引,并将 ignore_above 参数设置为 10

PUT /test-ignore
{
  "mappings": {
    "properties": {
      "sentence": {
        "type": "keyword",
        "ignore_above": 10
      }
    }
  }
}

索引包含相同长字符串值的文档:

PUT /test-ignore/_doc/1
{
  "sentence": "text longer than 10 characters"
}

运行针对完整字符串的词项查询:

POST /test-ignore/_search
{
  "query": {
    "term": {
      "sentence": "text longer than 10 characters"
    }
  }
}

没有结果返回,因为 sentence 字段中的字符串超过了 ignore_above 阈值,未被索引:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}

然而,文档仍然存在,这可以通过以下请求确认:

GET test-ignore/_search

返回的命中结果中包含该文档:

{
  ...
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "test-ignore",
        "_id": "1",
        "_score": 1,
        "_source": {
          "sentence": "text longer than 10 characters"
        }
      }
    ]
  }
}