In Spark, how many tasks are executed in parallel at a time? Discussions are found in How are stages split into tasks in Spark? and How DAG works under the covers in RDD?
But I do not find clear conclusion there.
Consider the following scenarios (assume spark.task.cpus = 1
, and ignore vcore
concept for simplicity):
- 10 executors (2 cores/executor), 10 partitions => I think the number of concurrent tasks at a time is 10
- 10 executors (2 cores/executor), 2 partitions => I think the number of concurrent tasks at a time is 2
- 10 executors (2 cores/executor), 20 partitions => I think the number of concurrent tasks at a time is 20
- 10 executors (1 cores/executor), 20 partitions => I think the number of concurrent tasks at a time is 10
Am I correct? Regarding the 3rd case, will it be 20 considering multi-threading (i.e. 2 threads because there are 2 cores) inside one executor?
UPDATE1
If the 3rd case is correct, it means:
- when idle cores inside an executor are available, Spark could automatically decide to trigger multithreads in that executor
- when there is only one core in the executor, multithread won't happen in that executor.
If this is true, isn't the behavior of Spark in an executor a bit uncertain (single thread v.s. multithread)?
Note that the code that is shipped from driver to the executors may not have considered automicity problem using e.g. synchronized keyword.
How is this handled by Spark?