8

In my mapPartition part, there are multi-threading works to do, I use thread pool and want to run a task in parallel. But I cannot distinguish these two parameters. I guess I can set --executor-cores to 5, and I run 4 threads in my task. Is this right?

cstur4
  • 966
  • 2
  • 8
  • 21

1 Answers1

17

spark.task.cpus is the number of cores to allocate for each task and --executor-cores specify Number of cores per executor.

There is small difference between executor and tasks as explained here.

For knowing how many threads you can run per core go through this post.

As per the links :

When you create the SparkContext, each worker starts an executor. This is a separate process (JVM). The executors connect back to your driver program. Now the driver can send them commands, like flatMap, map and reduceByKey, these commands are tasks.

For knowing number of threads your cpu supports per core run lscpu and check value of Thread(s) per core:.

Community
  • 1
  • 1
Amit Kumar
  • 2,685
  • 2
  • 37
  • 72
  • I set **spark.task.cpus**, **--executor-cores**, **--num-executors**, and I expect to get **--executor-cores** * **--num-executors** cores. But the cluster information shows I am wrong. – cstur4 Jun 01 '16 at 09:17
  • How are you getting cluster information? Are you getting confused with SPARK_WORKER_CORES and SPARK_EXECUTOR_CORES? – Amit Kumar Jun 01 '16 at 16:22
  • 2
    What is the cli argument for `spark.task.cpus`? – rjurney Jul 24 '19 at 02:07