0

I am running Spark using YARN(Hadoop 2.6) as cluster manager. YARN is running in Pseudo distributed mode. I have started the spark shell with 6 executors and was expecting the same

spark-shell --master yarn --num-executors 6

But whereas in the Spark Web UI, I see only 4 executors

enter image description here

Any reason for this?

PS : I ran the nproc command in my Ubuntu(14.04) and give below is the result. I believe this mean, my system has 8 cores

mountain@mountain:~$ nproc
8
Raj
  • 2,368
  • 6
  • 34
  • 52
  • Maybe there weren't enough hardware resources to start all 6 executors. How much memory have you reserved for YARN cluster? Check the YARN Resource Manager UI web interface. – vanekjar Jun 21 '15 at 09:32
  • @vanekjar. From the Resource Manager UI : Total Memory -> 8 GB, VCores Total -> 8. Any limitation here? – Raj Jun 21 '15 at 09:56
  • http://stackoverflow.com/questions/24622108/apache-spark-the-number-of-cores-vs-the-number-of-executors?rq=1 check this link – Abhishek Choudhary Jun 21 '15 at 11:10

2 Answers2

0

did you take in account spark.yarn.executor.memoryOverhead? possobly it creates hiden memory requrement and finaly yarn could not provide whole resources. also, note that yarn round container size to yarn.scheduler.increment-allocation-mb. all detail here: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

Mijatovic
  • 229
  • 1
  • 3
  • 7
0

This happens when there are not enough resources on your cluster to start more executors. Following things are taken into account

  1. Spark executor runs inside a yarn container. This container size is determined from the value of yarn.scheduler.minimum-allocation-mb in yarn-site.xml. Check this property. If your existing containers consume all available memory then more memory will not be available for new containers. so no new executors will be started

  2. The storage memory column in the UI displays the amount of memory used for execution and RDD storage. By default, this equals (HEAP_SPACE - 300MB) * 75%. The rest of the memory is used for internal metadata, user data structures and other stuffs. ref(Spark on YARN: Less executor memory than set via spark-submit)

I hope this helps.

Harjeet Kumar
  • 504
  • 2
  • 7