In my mapPartition part, there are multi-threading works to do, I use thread pool and want to run a task in parallel. But I cannot distinguish these two parameters. I guess I can set --executor-cores to 5, and I run 4 threads in my task. Is this right?
Asked
Active
Viewed 1.7k times
1 Answers
17
spark.task.cpus
is the number of cores to allocate for each task and --executor-cores
specify Number of cores per executor.
There is small difference between executor and tasks as explained here.
For knowing how many threads you can run per core go through this post.
As per the links :
When you create the SparkContext, each worker starts an executor. This is a separate process (JVM). The executors connect back to your driver program. Now the driver can send them commands, like flatMap, map and reduceByKey, these commands are tasks.
For knowing number of threads your cpu supports per core run lscpu
and check value of Thread(s) per core:
.

Community
- 1
- 1

Amit Kumar
- 2,685
- 2
- 37
- 72
-
I set **spark.task.cpus**, **--executor-cores**, **--num-executors**, and I expect to get **--executor-cores** * **--num-executors** cores. But the cluster information shows I am wrong. – cstur4 Jun 01 '16 at 09:17
-
How are you getting cluster information? Are you getting confused with SPARK_WORKER_CORES and SPARK_EXECUTOR_CORES? – Amit Kumar Jun 01 '16 at 16:22
-
2What is the cli argument for `spark.task.cpus`? – rjurney Jul 24 '19 at 02:07