How can I run multiple sql by Spark HiveContext in parallel?

Asked Mar 31 '18 at 09:34

Active Mar 31 '18 at 09:34

Viewed 444 times

Environment:
- Spark 1.5.1
- Hive 1.2.1
- Hadoop Yarn 2.7.2

In my code, I created a HiveContext to run mysql which insert data from a RDD to a hive table.

In order to achieve better performance, I run multiple sql in different threads with the same HiveContext. But through jstack output, I find most of threads are blocked at HiveContext and only a few of them are running. So I try to create seperate HiveContext within threads , but unfortunately I got error. After google, I learned that a single JVM can own just one HiveContext.

I'm stuck there. Hope anyone can provide some suggestion ?

asked Mar 31 '18 at 09:34

Pencilcap

1

There is nothing native within Spark to handle running queries in parallel. Instead you can take a look at Java concurrency and in particular Futures which will allow you to start queries in parallel and check status later. – Anurag Sharma Mar 31 '18 at 09:49
Can you please show us the code (reduced to the minimum). At the first glance it sounds wrong to start new threads on your own just to increase the insert parallelity. – TobiSH Mar 31 '18 at 11:15
see e.g, https://stackoverflow.com/a/48845764/1138523 – Raphael Roth Mar 31 '18 at 19:33
Raphael Roth, 3ks. Nice answer ! – Pencilcap Apr 02 '18 at 13:21

How can I run multiple sql by Spark HiveContext in parallel?

0 Answers0