Geohex网格聚合

Geohex分层地理空间索引系统 (H3) 将地球区域划分为可识别的Geohex单元格。

H3 网格系统非常适合邻近性应用，因为它克服了 Geohash 非均匀分区的局限性。Geohash 对经纬度对进行编码，导致在极点附近分区明显较小，而在赤道附近经度一度对应的分区则较大。然而，H3 网格系统的失真度很低，且仅限于 122 个分区中的 5 个。这五个分区被放置在低使用率区域（例如，海洋中部），从而保证了重要区域的无误差。因此，基于 H3 网格系统对文档进行分组，能提供比 Geohash 网格更好的聚合。

Geohex网格聚合将地理点分组到网格单元中，以进行地理分析。每个网格单元对应一个 H3 单元，并使用 H3Index 表示法进行标识。

精度

precision 参数控制决定网格单元大小的粒度级别。精度越低，网格单元越大。

以下示例说明了低精度和高精度的聚合请求。

首先，创建一个索引并将 location 字段映射为 geo_point：

PUT national_parks
{
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_point"
      }
    }
  }
}

将以下文档索引到示例索引中：

PUT national_parks/_doc/1
{
  "name": "Yellowstone National Park",
  "location": "44.42, -110.59" 
}

PUT national_parks/_doc/2
{
  "name": "Yosemite National Park",
  "location": "37.87, -119.53" 
}

PUT national_parks/_doc/3
{
  "name": "Death Valley National Park",
  "location": "36.53, -116.93" 
}

您可以通过多种格式索引地理点。

低精度请求

运行一个将所有三个文档聚合在一起的低精度请求：

GET national_parks/_search
{
  "aggregations": {
    "grouped": {
      "geohex_grid": {
        "field": "location",
        "precision": 1
      }
    }
  }
}

对于Geohex网格聚合查询，您可以使用 GET 或 POST HTTP 方法。 {: .note}

响应将文档 2 和 3 分组在一起，因为它们距离足够近，可以被聚合在一个网格单元中：

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "national_parks",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "Yellowstone National Park",
          "location" : "44.42, -110.59"
        }
      },
      {
        "_index" : "national_parks",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "Yosemite National Park",
          "location" : "37.87, -119.53"
        }
      },
      {
        "_index" : "national_parks",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "Death Valley National Park",
          "location" : "36.53, -116.93"
        }
      }
    ]
  },
  "aggregations" : {
    "grouped" : {
      "buckets" : [
        {
          "key" : "8129bffffffffff",
          "doc_count" : 2
        },
        {
          "key" : "8128bffffffffff",
          "doc_count" : 1
        }
      ]
    }
  }
}

高精度请求

现在运行一个高精度请求：

GET national_parks/_search
{
  "aggregations": {
    "grouped": {
      "geohex_grid": {
        "field": "location",
        "precision": 6
      }
    }
  }
}

由于粒度更高，所有三个文档被分别放入不同的桶中：

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "national_parks",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "Yellowstone National Park",
          "location" : "44.42, -110.59"
        }
      },
      {
        "_index" : "national_parks",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "Yosemite National Park",
          "location" : "37.87, -119.53"
        }
      },
      {
        "_index" : "national_parks",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "Death Valley National Park",
          "location" : "36.53, -116.93"
        }
      }
    ]
  },
  "aggregations" : {
    "grouped" : {
      "buckets" : [
        {
          "key" : "8629ab6dfffffff",
          "doc_count" : 1
        },
        {
          "key" : "8629857a7ffffff",
          "doc_count" : 1
        },
        {
          "key" : "862896017ffffff",
          "doc_count" : 1
        }
      ]
    }
  }
}

过滤请求

高精度请求消耗大量资源，因此我们建议使用像 geo_bounding_box 这样的过滤器来限制地理区域。例如，以下查询应用了一个过滤器来限制搜索区域：

GET national_parks/_search
{
  "size" : 0,  
  "aggregations": {
    "filtered": {
      "filter": {
        "geo_bounding_box": {
          "location": {
            "top_left": "38, -120",
            "bottom_right": "36, -116"
          }
        }
      },
      "aggregations": {
        "grouped": {
          "geohex_grid": {
            "field": "location",
            "precision": 6
          }
        }
      }
    }
  }
}

响应包含在 geo_bounding_box 边界内的两个文档：

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "filtered" : {
      "doc_count" : 2,
      "grouped" : {
        "buckets" : [
          {
            "key" : "8629ab6dfffffff",
            "doc_count" : 1
          },
          {
            "key" : "8629857a7ffffff",
            "doc_count" : 1
          }
        ]
      }
    }
  }
}

您还可以通过在 bounds 参数中提供边界矩形的坐标来限制地理区域。bounds 和 geo_bounding_box 坐标都可以使用任何地理点格式指定。以下查询对 bounds 参数使用了熟知文本 (WKT) “POINT(经度 纬度)” 格式：

GET national_parks/_search
{
  "size": 0,
  "aggregations": {
    "grouped": {
      "geohex_grid": {
        "field": "location",
        "precision": 6,
        "bounds": {
            "top_left": "POINT (-120 38)",
            "bottom_right": "POINT (-116 36)"
        }
      }
    }
  }
}

响应仅包含指定边界内的两个结果：

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "grouped" : {
      "buckets" : [
        {
          "key" : "8629ab6dfffffff",
          "doc_count" : 1
        },
        {
          "key" : "8629857a7ffffff",
          "doc_count" : 1
        }
      ]
    }
  }
}

bounds 参数可以与 geo_bounding_box 过滤器一起使用，也可以单独使用；这两个参数彼此独立，可以具有任何空间关系。

支持的参数

Geohex网格聚合请求支持以下参数。

参数	数据类型	描述
field	字符串	包含地理点的字段。此字段必须映射为 `geo_point` 字段。如果该字段包含一个数组，则所有数组值都会被聚合。必需。
precision	整数	用于确定分桶结果网格单元的粒度级别。单元格不能超过所需精度的指定大小（对角线）。有效值范围为 [0, 15]。可选。默认值为 5。
bounds	对象	用于过滤地理点的边界框。边界框由左上角和右下角顶点定义。顶点以以下格式之一指定为地理点： - 包含纬度和经度的对象 - 格式为 [`经度`, `纬度`] 的数组 - 格式为 "`纬度`,`经度`" 的字符串 - geohash - WKT 可选。
size	整数	要返回的最大桶数。当桶数超过 `size` 时，UDB-SX 会返回包含更多文档的桶。可选。默认值为 10,000。
shard_size	整数	从每个分片返回的最大桶数。可选。默认值为 max (10, `size` · 分片数量)，这可以为优先级更高的桶提供更准确的计数。