5

TL;DR

Spark UI shows different number of cores and memory than what I'm asking it when using spark-submit

more details:

I'm running Spark 1.6 in standalone mode. When I run spark-submit I pass it 1 executor instance with 1 core for the executor and also 1 core for the driver. What I would expect to happen is that my application will be ran with 2 cores total. When I check the environment tab on the UI I see that it received the correct parameters I gave it, however it still seems like its using a different number of cores. You can see it here:

enter image description here

This is my spark-defaults.conf that I'm using:

spark.executor.memory 5g
spark.executor.cores 1
spark.executor.instances 1
spark.driver.cores 1

Checking the environment tab on the Spark UI shows that these are indeed the received parameters but the UI still shows something else

Does anyone have any idea on what might cause Spark to use different number of cores than what I want I pass it? I obviously tried googling it but didn't find anything useful on that topic

Thanks in advance

Gideon
  • 2,211
  • 5
  • 29
  • 47
  • How are you running Spark ? In cluster or client mode ? With YARN (based on the use of executor.instances ..) ? – Jonathan Taws Jun 13 '16 at 08:39
  • Standalone (it's in the beginning of the question), not YARN, I thought about adding the spark submit line but its just the master and the main class, the rest is given through the spark-defaults.conf – Gideon Jun 13 '16 at 09:18
  • 2
    Then this makes sense : in standalone mode, a greedy strategy is used and as many executors will be created as there are cores and memory used. In you case, you specified 1 core per executor, so Spark will try to create 8 executors as there 8 cores available. However, as there is only 30GB of RAM available, only 6 can be created (6 executors with 5GB each of RAM). You end up with 6 executors. `spark.executor.instances` is a YARN-only configuration. Your best bet is to set the total number of cores to 2 using `spark.cores.max`, tell me if this is any better. – Jonathan Taws Jun 13 '16 at 09:26
  • 1
    Ah, cool. You should write it as the answer and I'll mark it as accepted. Thanks! – Gideon Jun 13 '16 at 09:28

1 Answers1

6

TL;DR

Use spark.cores.max instead to define the total number of cores available, and thus limit the number of executors.


In standalone mode, a greedy strategy is used and as many executors will be created as there are cores and memory available on your worker.

In your case, you specified 1 core and 5GB of memory per executor. The following will be calculated by Spark :

  • As there are 8 cores available, it will try to create 8 executors.
  • However, as there is only 30GB of memory available, it can only create 6 executors : each executor will have 5GB of memory, which adds up to 30GB.
  • Therefore, 6 executors will be created, and a total of 6 cores will be used with 30GB of memory.

Spark basically fulfilled what you asked for. In order to achieve what you want, you can make use of the spark.cores.max option documented here and specify the exact number of cores you need.

A few side-notes :

  • spark.executor.instances is a YARN-only configuration
  • spark.driver.memory defaults to 1 core already

I am also working on easing the notion of the number of executors in standalone mode, this might get integrated into a next release of Spark and hopefully help figuring out exactly the number of executors you are going to have, without having to calculate it on the go.

Jonathan Taws
  • 1,168
  • 11
  • 24
  • I am getting an error while submitting job to Master that says - Initial job not accepted - http://stackoverflow.com/questions/38359801/spark-job-submitted-waiting-taskschedulerimpl-initial-job-not-accepted - Any relation wrt number of cores/memory to be assigned...Using POST API call to submit application to spark as given in my question – Chaitanya Bapat Jul 15 '16 at 12:09