Spark: How to tune memory / cores given my cluster?

Question

There are several threads with significant votes that I am having difficulty interpreting, perhaps due to jargon of 2016 being different of that today? (or I am just not getting it, too)

Apache Spark: The number of cores vs. the number of executors

How to tune spark executor number, cores and executor memory?

Azure/Databricks offers some best practices on cluster sizing: https://learn.microsoft.com/en-us/azure/databricks/clusters/cluster-config-best-practices

So for my workload, lets say I am interested in (using Databricks current jargon):

1 Driver: Comprised of 64gb of memory and 8 cores
1 Worker: Comprised of 256gb of memory and 64 cores

Drawing on the above Microsoft link, fewer workers should in turn lead to less shuffle; among the most costly Spark operations.

So, I have 1 driver and 1 worker. How, then, do I translate these terms into what is discussed here on SO in terms of "nodes" and "executors".

Ultimately, I would like to set my Spark config "correctly" such that cores and memory are, as optimized as possible.

If you are only going to use one worker (executor), why are you using spark? If shuffle is a problem, it should be solved using a more effective partitioning strategy, not by getting rid of parallelized execution. You should configure more executors and split the available memory and CPU's between them. Also 64gb and 8 cores is too much for the driver. — Z4-tier, Dec 17 '22 at 01:41
As far as I can tell now, despite having 1 "worker", you can assign many executors to that single worker / VM. Any idea why the owners of Databricks / creators of Spark recommend 1 worker in several situations? https://learn.microsoft.com/en-us/azure/databricks/clusters/cluster-config-best-practices — John Stud, Dec 17 '22 at 02:09
The jargon hasn't changed much. Check out the [official documentation](https://spark.apache.org/docs/latest/cluster-overview.html) first. — Hristo Iliev, Dec 17 '22 at 11:37

Spark: How to tune memory / cores given my cluster?

0 Answers0