How to process multiple Spark SQL queries in parallel

Asked Jun 21 '18 at 18:48

Active Jun 21 '18 at 18:48

Viewed 161 times

Suppose I have something like

df1 = sqlContext.sql("select count(1) as ct1 from big_table_1")
df2 = sqlContext.sql("select count(1) as ct2 from big_table_2")
df1.show()
df2.show()

Within each table (either Hive or temporary), the rows will be counted in parallel across the worker nodes, assuming the underlying dataframe is partitioned.

Is there also a way I can get the two tables to count in parallel? Is this even possible in PySpark?

asked Jun 21 '18 at 18:48

wrschneider

17,913
16
96
176

Why don’t you do union all if it’s just a count. – William R Jun 21 '18 at 20:17
It might not be. I'm just giving that as a minimal example – wrschneider Jun 21 '18 at 20:47

How to process multiple Spark SQL queries in parallel

0 Answers0