There are several threads with significant votes that I am having difficulty interpreting, perhaps due to jargon of 2016 being different of that today? (or I am just not getting it, too)
Apache Spark: The number of cores vs. the number of executors
How to tune spark executor number, cores and executor memory?
Azure/Databricks offers some best practices on cluster sizing: https://learn.microsoft.com/en-us/azure/databricks/clusters/cluster-config-best-practices
So for my workload, lets say I am interested in (using Databricks current jargon):
- 1 Driver: Comprised of 64gb of memory and 8 cores
- 1 Worker: Comprised of 256gb of memory and 64 cores
Drawing on the above Microsoft link, fewer workers should in turn lead to less shuffle; among the most costly Spark operations.
So, I have 1 driver and 1 worker. How, then, do I translate these terms into what is discussed here on SO in terms of "nodes" and "executors".
Ultimately, I would like to set my Spark config "correctly" such that cores and memory are, as optimized as possible.