Hive and Spark: how can I use COALESCE with subquery?

Question

I have a subquery and I want to return some default value if subquery if empty. I.e. I need something like

coalesce((select top 1 Id from ...),0)

The problem is that in Hive subqueries are only supported in FROM clause and such code returns me compile error. I am not sure about Spark.

Is there any way to implement such simple logic so that it works in both Hive and Spark SQL?

If you can add the query along with sample input and output, that would help. — linusRian, Jul 20 '21 at 10:29
In the from clause you can add something like this ```from (select coalesce((select top 1 Id from ...),0) from table) A``` — linusRian, Jul 20 '21 at 11:25

score 0 · Answer 1 · answered Jul 20 '21 at 12:59

Hive 2.3.0+ has some limited support for scalar subqueries in the select clause. Please see Jira: HIVE-16091 and wiki for details.

Spark support scalar subqueries in select clause from version 2.0 onwards. Some good examples here: Subqueries in Apache Spark 2.0

If you are using one of the earlier versions, and want to have the query working in both Hive and Spark, the only option would be to rewrite them using joins and then use COALESCE in the column expressions. Please see this: Procedurally transform subquery into join

Spark is also internally rewriting scalar subqueries into joins: subquery.scala

Koushik Roy · Answer 2 · 2021-07-20T20:26:35.440

I think you can left join subquery and main table to do this.
Pseudo code below-

Select col1,
coalesce(t2.dummyid,0) as coalesceid 
Form table1 t1
Left join (select tab.*, 1 as dummyid from tab) t2 on t2.key= t1.key

Now, if your subqry doesn't have key column to join on then use the dummy column to join together.

Select col1,
coalesce(t2.dummyid,0) as coalesceid 
Form table1 t1
Left join (select tab.*, 1 as dummyid from tab) t2 on t2.dummyid= 1

Hive and Spark: how can I use COALESCE with subquery?

2 Answers2