I have setup a Spark on YARN cluster on my laptop, and have problem running multiple concurrent jobs in Spark, using python multiprocessing. I am running on yarn-client mode. I tried two ways to achieve this:
- Setup a single SparkContext and create multiple processes to submit jobs. This method does not work, and the program crashes. I guess a single SparkContext does not support python multiple processes
For each process, setup a SparkContext and submit the job. In this case, the job is submitted successfully to YARN, but the jobs are run serially, only one job is run at a time while the rest are in queue. Is it possible to start multiple jobs concurrently?
Update on the settings
YARN:
yarn.nodemanager.resource.cpu-vcores 8
- yarn.nodemanager.resource.memory-mb 11264
yarn.scheduler.maximum-allocation-vcores 1
Spark:
SPARK_EXECUTOR_CORES=1
- SPARK_EXECUTOR_INSTANCES=2
- SPARK_DRIVER_MEMORY=1G
- spark.scheduler.mode = FAIR
- spark.dynamicAllocation.enabled = true
- spark.shuffle.service.enabled = true
yarn will only run one job at a time, using 3 containers, 3 vcores, 3GB ram. So there are ample vcores and rams available for the other jobs, but they are not running