1

Suppose I'm working with a cluster with 2 i3.metal instances, which each have 512GiB of memory and 72 vCPU cores (source). If I want to use all of the cores, I need some configuration of executors and cores per executor that gives me 144 cores. There seem to be many options for this; for example, I could have 72 executors with 2 cores each, or I could have 36 executors with 4 cores each. Either way, I end up with the same number of cores and the same amount of memory per core.

How do I choose between these two configurations, or the many more that are available? Is there any functional difference between the two?

I have read Cloudera's blog post about parameter tuning for spark jobs, but it didn't answer this question. I have also searched SO for related posts, but again, didn't find an answer to this question.

The comments on the top answer in this post indicate that there isn't a single answer and it should be tuned for each job. If this is the case, I would appreciate any "general wisdom" that's out there!

2 Answers2

0

Indeed, there is no absolute answer for all use cases. Each job is different.

When I want to execute a new job, the general wisdom I am using is to start with a default configuration somewhere in the middle between thin and fat executors: several cores per executor, and several executors per machine.

I typically take the square root of the number of cores per machine for the cores per executor. And then, I fine-tune these parameters to the job, comparing performance, also looking at hardware bottlenecks (memory? cores? disk? network?). If the job fails, starting with subsets of the dataset and then scaling up helps, too.

So with this configuration, I would intuitively start with 18 executors (9 per machine) with 8 cores each, but 36 executors with 4 cores would also sound reasonable to me as an initial configuration.

Going for one core per (thin) executor, or one (fat) executor per node taking all cores of the machine tends to be inefficient for various reasons in terms of resources and bottlenecks.

Also, Spark has default caps on memory per executor. If there are few executors with lots of cores, it will under-utilize the memory unless you allocate more.

I hope this helps!

Ghislain Fourny
  • 6,971
  • 1
  • 30
  • 37
0

I would say 5 cores per executor would be a sweet spot to not cause any IO burden on your input data sources. Having said that also make sure that you are not having too less of memory per core. Ideally don't go less than 8g per executor.

Again as Ghislain mentioned, it depends on your operations but thats where I would start.

  • This is where I'm confused - why does the number of cores per executor create IO burden? If I have 72 executors with 1 core each, or 36 executors with 2 cores each, I'm still performing 72 IO tasks. Why does the grouping by executor affect anything? – Austin Jordan Nov 20 '19 at 02:26
  • Well the IO burden would be on the HDFS end if you are using one. Hdfs Client doesnt like too many parallel threads hitting it, but it can vary for different data sources. – Pranav Sawant Nov 20 '19 at 16:36
  • Now if you are using 36 executors with 2 cores each it will give you a greater performance if all you do is mapping task .... however if you introduce shuffle you will now enter memory implications related to shuffle file blocks and likes ... where memory per core will play a bigger role .... so if all you do is mappers ... I would recommend go ballistic on nuMOfCores with least amount of memory ... however if you are shuffle heavy ... you may want to increase memory per core – Pranav Sawant Nov 20 '19 at 16:39