1

When configuring a spark job, I have sometimes seen people suggest that the number of cores per executor be greater than the total number of cores divided by the number of executors.

Notably, in this example the following is suggested by @0x0FFF:

--num-executors 4 --executor-memory 12g --executor-cores 4

If we compute the total number of executor cores we get 4 cores per executor * 4 executors total = 16 cores total.
However, in the beginning of the question it says "I have one NameNode and two DataNode with 30GB of RAM each, 4 cores each". So, total number of cores is 2 worker nodes * 4 cores each = 8 cores.

Is it possible to have 16 cores utilized by 4 executors with this hardware? If so, how?

Community
  • 1
  • 1
makansij
  • 9,303
  • 37
  • 105
  • 183
  • It is more like a question of how CPU cores are related to Threads and what is the best configuration. see here this may help you - http://stackoverflow.com/questions/13834692/threads-configuration-based-on-no-of-cpu-cores – Sumit Dec 13 '15 at 03:27
  • I have no background in `HW` so it's difficult for me to understand that question you linked. I'm only interested in how it relates to the configuration of `spark` jobs. How is a `thread` related to any of these: "cores", "executors", "nodes" ? – makansij Dec 13 '15 at 06:12
  • 1
    As far as I know, Spark will spin one thread per core. So if an executor is given 2 cores, that executor will spin 2 threads and run 2 tasks in parallel. Assigning more cores to an executor than are available will not fly - at least not on YARN. I suggest you read this excellent blog-post from Cloudera http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ – Glennie Helles Sindholt Dec 13 '15 at 12:59
  • Thanks for your comment. Yeah, I've read that post and that's part of my confusion. AFAIK `Mesos` doesn't allow that either. – makansij Dec 13 '15 at 19:31
  • @GlennieHellesSindholt I think you should make it an answer. `core` == `thread` is pretty much all there is here. If it make sense or is allowed by given manger is completely different story. – zero323 Dec 17 '15 at 00:45

1 Answers1

1

So, as I wrote in a comment, Spark will spin one thread per core, and I know that for YARN you cannot assign more cores than are available to an executor. If you do, it simply won't launch those executors. This is also described in more detail in this blog post from Cloudera.

Glennie Helles Sindholt
  • 12,816
  • 5
  • 44
  • 50