Mismatch in no of Executors(Spark in YARN Pseudo distributed mode)

Question

I am running Spark using YARN(Hadoop 2.6) as cluster manager. YARN is running in Pseudo distributed mode. I have started the spark shell with 6 executors and was expecting the same

spark-shell --master yarn --num-executors 6

But whereas in the Spark Web UI, I see only 4 executors

enter image description here

Any reason for this?

PS : I ran the nproc command in my Ubuntu(14.04) and give below is the result. I believe this mean, my system has 8 cores

mountain@mountain:~$ nproc
8

Maybe there weren't enough hardware resources to start all 6 executors. How much memory have you reserved for YARN cluster? Check the YARN Resource Manager UI web interface. — vanekjar, Jun 21 '15 at 09:32
@vanekjar. From the Resource Manager UI : Total Memory -> 8 GB, VCores Total -> 8. Any limitation here? — Raj, Jun 21 '15 at 09:56
http://stackoverflow.com/questions/24622108/apache-spark-the-number-of-cores-vs-the-number-of-executors?rq=1 check this link — Abhishek Choudhary, Jun 21 '15 at 11:10

score 0 · Answer 1 · answered Jul 14 '15 at 16:35

did you take in account spark.yarn.executor.memoryOverhead? possobly it creates hiden memory requrement and finaly yarn could not provide whole resources. also, note that yarn round container size to yarn.scheduler.increment-allocation-mb. all detail here: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

score 0 · Answer 2 · answered Nov 28 '18 at 07:37

This happens when there are not enough resources on your cluster to start more executors. Following things are taken into account

Spark executor runs inside a yarn container. This container size is determined from the value of yarn.scheduler.minimum-allocation-mb in yarn-site.xml. Check this property. If your existing containers consume all available memory then more memory will not be available for new containers. so no new executors will be started
The storage memory column in the UI displays the amount of memory used for execution and RDD storage. By default, this equals (HEAP_SPACE - 300MB) * 75%. The rest of the memory is used for internal metadata, user data structures and other stuffs. ref(Spark on YARN: Less executor memory than set via spark-submit)

I hope this helps.

Mismatch in no of Executors(Spark in YARN Pseudo distributed mode)

2 Answers2