子查询
子查询(也称为子搜索)允许您在一个查询中使用另一个查询的结果。UDB-SX 管道处理语言(PPL)支持四种类型的子查询命令:
前三种子查询命令(in、exists 和 scalar)是表达式,可用于 where 命令(where <布尔表达式>)和搜索过滤器(search source=* <布尔表达式>)。relation 子查询命令是一个语句,可用于 join 操作。
in
in 子查询允许您检查某个字段的值是否存在于另一个查询的结果中。当您希望基于另一个索引或查询的数据来过滤结果时,这非常有用。
语法
where <field> [not] in [ search source=... | ... | ... ]
用法
source = outer | where a in [ source = inner | fields b ]
source = outer | where (a) in [ source = inner | fields b ]
source = outer | where (a,b,c) in [ source = inner | fields d,e,f ]
source = outer | where a not in [ source = inner | fields b ]
source = outer | where (a) not in [ source = inner | fields b ]
source = outer | where (a,b,c) not in [ source = inner | fields d,e,f ]
source = outer a in [ source = inner | fields b ]
source = outer a not in [ source = inner | fields b ]
source = outer | where a in [ source = inner1 | where b not in [ source = inner2 | fields c ] | fields b ] // 嵌套
source = table1 | inner join left = l right = r on l.a = r.a AND r.a in [ source = inner | fields d ] | fields l.a, r.a, b, c // 作为连接过滤器
exists
exists 子查询检查子搜索查询是否返回任何结果。这对于相关子查询特别有用,当您想要检查相关记录是否存在时。
语法
where [not] exists [ search source=... | ... | ... ]
用法
以下示例演示了实现 exists 子查询的不同方式,从简单的聚合比较到复杂的嵌套计算。
创建这些示例时假设:
a和b是表 outer 的字段。c和d是表 inner 的字段。e和f是表 nested 的字段。
相关子查询
在以下示例中,内部查询引用了外部查询的字段(例如当 a = c 时),从而在查询之间创建了依赖关系。子查询会为外部查询的每一行评估一次:
source = outer | where exists [ source = inner | where a = c ]
source = outer | where not exists [ source = inner | where a = c ]
source = outer | where exists [ source = inner | where a = c and b = d ]
source = outer | where not exists [ source = inner | where a = c and b = d ]
source = outer exists [ source = inner | where a = c ]
source = outer not exists [ source = inner | where a = c ]
source = table as t1 exists [ source = table as t2 | where t1.a = t2.a ]
非相关子查询
在以下示例中,子查询独立于外部查询。内部查询不引用外部查询的任何字段,因此无论外部查询有多少行,内部查询都只评估一次:
source = outer | where exists [ source = inner | where c > 10 ]
source = outer | where not exists [ source = inner | where c > 10 ]
嵌套子查询
以下示例演示了如何将一个子查询嵌套在另一个子查询中,创建多层次的查询复杂度。这种方法适用于需要来自不同数据源的多个条件的复杂过滤场景:
source = outer | where exists [ source = inner1 | where a = c and exists [ source = nested | where c = e ] ]
source = outer | where exists [ source = inner1 | where a = c | where exists [ source = nested | where c = e ] ]
scalar
scalar 子查询返回一个单一值,您可以在比较或计算中使用。当您需要将一个字段与另一个查询的聚合值进行比较时,这非常有用。
语法
where <field> = [ search source=... | ... | ... ]
用法
以下示例演示了实现 scalar 子查询的不同方式,从简单的聚合比较到复杂的嵌套计算。
非相关子查询
在以下示例中,scalar 子查询独立于外部查询。这些子查询检索一个单一值,可用于计算或比较:
source = outer | eval m = [ source = inner | stats max(c) ] | fields m, a
source = outer | eval m = [ source = inner | stats max(c) ] + b | fields m, a
source = outer | where a > [ source = inner | stats min(c) ] | fields a
source = outer a > [ source = inner | stats min(c) ] | fields a
相关子查询
在以下示例中,scalar 子查询引用了外部查询的字段,创建了一种依赖关系,其中内部查询的结果取决于外部查询的每一行:
source = outer | eval m = [ source = inner | where outer.b = inner.d | stats max(c) ] | fields m, a
source = outer | eval m = [ source = inner | where b = d | stats max(c) ] | fields m, a
source = outer | eval m = [ source = inner | where outer.b > inner.d | stats max(c) ] | fields m, a
source = outer | where a = [ source = inner | where outer.b = inner.d | stats max(c) ]
source = outer | where a = [ source = inner | where b = d | stats max(c) ]
source = outer | where [ source = inner | where outer.b = inner.d OR inner.d = 1 | stats count() ] > 0 | fields a
source = outer a = [ source = inner | where b = d | stats max(c) ]
source = outer [ source = inner | where outer.b = inner.d OR inner.d = 1 | stats count() ] > 0 | fields a
嵌套子查询
以下示例演示了如何嵌套多个 scalar 子查询,以创建复杂的比较或将一个子查询的结果用于另一个子查询:
source = outer | where a = [ source = inner | stats max(c) | sort c ] OR b = [ source = inner | where c = 1 | stats min(d) | sort d ]
source = outer | where a = [ source = inner | where c = [ source = nested | stats max(e) by f | sort f ] | stats max(d) by c | sort c | head 1 ]
relation
relation 子查询允许您将查询结果用作连接操作中的数据集。当您需要与过滤或转换后的数据集进行连接,而不是直接与静态索引连接时,这非常有用。
语法
join on <condition> [ search source=... | ... | ... ] [as alias]
用法
以下示例演示了如何在连接操作中使用 relation 子查询。第一个示例展示了如何与过滤后的数据集进行连接,第二个示例展示了如何将一个 relation 子查询嵌套在另一个查询中:
source = table1 | join left = l right = r on condition [ source = table2 | where d > 10 | head 5 ] // 子查询在连接右侧
source = [ source = table1 | join left = l right = r [ source = table2 | where d > 10 | head 5 ] | stats count(a) by b ] as outer | head 1
示例
以下示例展示了不同类型的子查询如何在查询场景中协同工作,例如多级查询或嵌套多种子查询类型。
复杂查询示例
以下示例演示了如何在复杂查询中组合不同类型的子查询。
示例 1:包含 in 和 scalar 子查询的查询
以下查询同时使用 in 和 scalar 子查询来查找来自加拿大、供应名称以 “forest” 开头的零件,并且可用数量大于 1994 年订购总量的供应商:
source = supplier
| join ON s_nationkey = n_nationkey nation
| where n_name = 'CANADA'
and s_suppkey in [ /* in 子查询 */
source = partsupp
| where ps_partkey in [ /* 嵌套的 in 子查询 */
source = part
| where like(p_name, 'forest%')
| fields p_partkey
]
and ps_availqty > [ /* scalar 子查询 */
source = lineitem
| where l_partkey = ps_partkey
and l_suppkey = ps_suppkey
and l_shipdate >= date('1994-01-01')
and l_shipdate < date_add(date('1994-01-01'), interval 1 year)
| stats sum(l_quantity) as sum_l_quantity
| eval half_sum_l_quantity = 0.5 * sum_l_quantity
| fields half_sum_l_quantity
]
| fields ps_suppkey
示例 2:包含 relation、scalar 和 exists 子查询的查询
以下查询使用 relation、scalar 和 exists 子查询来查找来自特定国家代码、账户余额高于平均水平且未下过任何订单的客户:
source = [ /* relation 子查询 */
source = customer
| where substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17')
and c_acctbal > [ /* scalar 子查询 */
source = customer
| where c_acctbal > 0.00
and substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17')
| stats avg(c_acctbal)
]
and not exists [ /* 相关 exists 子查询 */
source = orders
| where o_custkey = c_custkey
]
| eval cntrycode = substring(c_phone, 1, 2)
| fields cntrycode, c_acctbal
] as custsale
| stats count() as numcust, sum(c_acctbal) as totacctbal by cntrycode
| sort cntrycode
限制
PPL 子查询仅在 plugins.calcite.enabled 设置为 true 时有效。