Suppose I'm working with a cluster with 2 i3.metal instances, which each have 512GiB of memory and 72 vCPU cores (source). If I want to use all of the cores, I need some configuration of executors and cores per executor that gives me 144 cores. There seem to be many options for this; for example, I could have 72 executors with 2 cores each, or I could have 36 executors with 4 cores each. Either way, I end up with the same number of cores and the same amount of memory per core.
How do I choose between these two configurations, or the many more that are available? Is there any functional difference between the two?
I have read Cloudera's blog post about parameter tuning for spark jobs, but it didn't answer this question. I have also searched SO for related posts, but again, didn't find an answer to this question.
The comments on the top answer in this post indicate that there isn't a single answer and it should be tuned for each job. If this is the case, I would appreciate any "general wisdom" that's out there!