子查询

子查询(也称为子搜索)允许您在一个查询中使用另一个查询的结果。UDB-SX 管道处理语言(PPL)支持四种类型的子查询命令:

前三种子查询命令(inexistsscalar)是表达式,可用于 where 命令(where <布尔表达式>)和搜索过滤器(search source=* <布尔表达式>)。relation 子查询命令是一个语句,可用于 join 操作。

in

in 子查询允许您检查某个字段的值是否存在于另一个查询的结果中。当您希望基于另一个索引或查询的数据来过滤结果时,这非常有用。

语法

where <field> [not] in [ search source=... | ... | ... ]

用法

source = outer | where a in [ source = inner | fields b ]
source = outer | where (a) in [ source = inner | fields b ]
source = outer | where (a,b,c) in [ source = inner | fields d,e,f ]
source = outer | where a not in [ source = inner | fields b ]
source = outer | where (a) not in [ source = inner | fields b ]
source = outer | where (a,b,c) not in [ source = inner | fields d,e,f ]
source = outer a in [ source = inner | fields b ]
source = outer a not in [ source = inner | fields b ]
source = outer | where a in [ source = inner1 | where b not in [ source = inner2 | fields c ] | fields b ] // 嵌套
source = table1 | inner join left = l right = r on l.a = r.a AND r.a in [ source = inner | fields d ] | fields l.a, r.a, b, c // 作为连接过滤器

exists

exists 子查询检查子搜索查询是否返回任何结果。这对于相关子查询特别有用,当您想要检查相关记录是否存在时。

语法

where [not] exists [ search source=... | ... | ... ]

用法

以下示例演示了实现 exists 子查询的不同方式,从简单的聚合比较到复杂的嵌套计算。

创建这些示例时假设:

  • ab 是表 outer 的字段。

  • cd 是表 inner 的字段。

  • ef 是表 nested 的字段。

相关子查询

在以下示例中,内部查询引用了外部查询的字段(例如当 a = c 时),从而在查询之间创建了依赖关系。子查询会为外部查询的每一行评估一次:

source = outer | where exists [ source = inner | where a = c ]
source = outer | where not exists [ source = inner | where a = c ]
source = outer | where exists [ source = inner | where a = c and b = d ]
source = outer | where not exists [ source = inner | where a = c and b = d ]
source = outer exists [ source = inner | where a = c ]
source = outer not exists [ source = inner | where a = c ]
source = table as t1 exists [ source = table as t2 | where t1.a = t2.a ]

非相关子查询

在以下示例中,子查询独立于外部查询。内部查询不引用外部查询的任何字段,因此无论外部查询有多少行,内部查询都只评估一次:

source = outer | where exists [ source = inner | where c > 10 ]
source = outer | where not exists [ source = inner | where c > 10 ]

嵌套子查询

以下示例演示了如何将一个子查询嵌套在另一个子查询中,创建多层次的查询复杂度。这种方法适用于需要来自不同数据源的多个条件的复杂过滤场景:

source = outer | where exists [ source = inner1 | where a = c and exists [ source = nested | where c = e ] ]
source = outer | where exists [ source = inner1 | where a = c | where exists [ source = nested | where c = e ] ]

scalar

scalar 子查询返回一个单一值,您可以在比较或计算中使用。当您需要将一个字段与另一个查询的聚合值进行比较时,这非常有用。

语法

where <field> = [ search source=... | ... | ... ]

用法

以下示例演示了实现 scalar 子查询的不同方式,从简单的聚合比较到复杂的嵌套计算。

非相关子查询

在以下示例中,scalar 子查询独立于外部查询。这些子查询检索一个单一值,可用于计算或比较:

source = outer | eval m = [ source = inner | stats max(c) ] | fields m, a
source = outer | eval m = [ source = inner | stats max(c) ] + b | fields m, a
source = outer | where a > [ source = inner | stats min(c) ] | fields a
source = outer a > [ source = inner | stats min(c) ] | fields a

相关子查询

在以下示例中,scalar 子查询引用了外部查询的字段,创建了一种依赖关系,其中内部查询的结果取决于外部查询的每一行:

source = outer | eval m = [ source = inner | where outer.b = inner.d | stats max(c) ] | fields m, a
source = outer | eval m = [ source = inner | where b = d | stats max(c) ] | fields m, a
source = outer | eval m = [ source = inner | where outer.b > inner.d | stats max(c) ] | fields m, a
source = outer | where a = [ source = inner | where outer.b = inner.d | stats max(c) ]
source = outer | where a = [ source = inner | where b = d | stats max(c) ]
source = outer | where [ source = inner | where outer.b = inner.d OR inner.d = 1 | stats count() ] > 0 | fields a
source = outer a = [ source = inner | where b = d | stats max(c) ]
source = outer [ source = inner | where outer.b = inner.d OR inner.d = 1 | stats count() ] > 0 | fields a

嵌套子查询

以下示例演示了如何嵌套多个 scalar 子查询,以创建复杂的比较或将一个子查询的结果用于另一个子查询:

source = outer | where a = [ source = inner | stats max(c) | sort c ] OR b = [ source = inner | where c = 1 | stats min(d) | sort d ]
source = outer | where a = [ source = inner | where c =  [ source = nested | stats max(e) by f | sort f ] | stats max(d) by c | sort c | head 1 ]

relation

relation 子查询允许您将查询结果用作连接操作中的数据集。当您需要与过滤或转换后的数据集进行连接,而不是直接与静态索引连接时,这非常有用。

语法

join on <condition> [ search source=... | ... | ... ] [as alias]

用法

以下示例演示了如何在连接操作中使用 relation 子查询。第一个示例展示了如何与过滤后的数据集进行连接,第二个示例展示了如何将一个 relation 子查询嵌套在另一个查询中:

source = table1 | join left = l right = r on condition [ source = table2 | where d > 10 | head 5 ] // 子查询在连接右侧
source = [ source = table1 | join left = l right = r [ source = table2 | where d > 10 | head 5 ] | stats count(a) by b ] as outer | head 1

示例

以下示例展示了不同类型的子查询如何在查询场景中协同工作,例如多级查询或嵌套多种子查询类型。

复杂查询示例

以下示例演示了如何在复杂查询中组合不同类型的子查询。

示例 1:包含 inscalar 子查询的查询

以下查询同时使用 inscalar 子查询来查找来自加拿大、供应名称以 “forest” 开头的零件,并且可用数量大于 1994 年订购总量的供应商:

source = supplier
| join ON s_nationkey = n_nationkey nation
| where n_name = 'CANADA'
   and s_suppkey in [ /* in 子查询 */
     source = partsupp
     | where ps_partkey in [ /* 嵌套的 in 子查询 */
         source = part
         | where like(p_name, 'forest%')
         | fields p_partkey
       ]
       and ps_availqty > [ /* scalar 子查询 */
         source = lineitem
         | where l_partkey = ps_partkey
           and l_suppkey = ps_suppkey
           and l_shipdate >= date('1994-01-01')
           and l_shipdate < date_add(date('1994-01-01'), interval 1 year)
         | stats sum(l_quantity) as sum_l_quantity
         | eval half_sum_l_quantity = 0.5 * sum_l_quantity
         | fields half_sum_l_quantity
       ]
     | fields ps_suppkey

示例 2:包含 relationscalarexists 子查询的查询

以下查询使用 relationscalarexists 子查询来查找来自特定国家代码、账户余额高于平均水平且未下过任何订单的客户:

source = [  /* relation 子查询 */
  source = customer
    | where substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17')
      and c_acctbal > [ /* scalar 子查询 */
          source = customer
          | where c_acctbal > 0.00
            and substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17')
          | stats avg(c_acctbal)
        ]
      and not exists [ /* 相关 exists 子查询 */
          source = orders
          | where o_custkey = c_custkey
        ]
    | eval cntrycode = substring(c_phone, 1, 2)
    | fields cntrycode, c_acctbal
  ] as custsale
| stats count() as numcust, sum(c_acctbal) as totacctbal by cntrycode
| sort cntrycode

限制

PPL 子查询仅在 plugins.calcite.enabled 设置为 true 时有效。