How to submit multiple spark queries parallel using SqlContext

Question

I want to run multiple spark SQL parallel in a spark cluster, so that I can utilize the complete resource. I'm using sqlContext.sql(query).

EDITS

I saw some sample code here like follows,

val parallelism = 10
val executor = Executors.newFixedThreadPool(parallelism)
val ec: ExecutionContext = ExecutionContext.fromExecutor(executor)
val tasks: Seq[String] = ???
val results: Seq[Future[Int]] = tasks.map(query => {
  Future{
    //spark stuff here
    0
  }(ec)
})
val allDone: Future[Seq[Int]] = Future.sequence(results)
//wait for results
Await.result(allDone, scala.concurrent.duration.Duration.Inf)
executor.shutdown //otherwise jvm will probably not exit

As I understood, the ExecutionContext compute the available cores in the machine(using ForkJoinPool) and do the parallelism accordingly. But what happens if we consider the spark cluster other-than the single machine and How can it guarantee the complete cluster resource utilization.?

eg: If I have a 10 node cluster with each 4 cores, then how can the above code guarantees that the 40 cores will be utilized.

Thanks for your help. But that question is unanswered. Also a sample code will be more helpful. — Devas, May 02 '18 at 12:05
Here you are: [How can I parallelize different SparkSQL execution efficiently?](https://stackoverflow.com/a/50058194/9613318) — Alper t. Turker, May 02 '18 at 12:07
Waw, thats very much helpful. Thank you. Will try the same. :) — Devas, May 02 '18 at 12:10
If you search, you'll find a bunch of other similar questions, some probably with better answers. There is for example [Processing multiple files as independent RDD's in parallel](https://stackoverflow.com/q/31912858/9613318). My search skills are just not top-notch today :) — Alper t. Turker, May 02 '18 at 12:12

How to submit multiple spark queries parallel using SqlContext

0 Answers0