0

I am trying to run a Spark application on AWS EMR. I see that the driver is always shared with the executors. Is there a way to avoid that so that driver gets the host to itself?

I was able to do that in a way by increasing the memory to a big number. But given that the host type is not fixed, I have to keep changing that setting. Is there a way to indicate to use all that is available without specifying a positive integer value? I tried setting spark.driver.memory to 0, but it does not like 0. Thanks.

nit
  • 85
  • 1
  • 5
  • what do you mean by `host to itself`? what does `host` mean, is it master or worker node or something else? – Snigdhajyoti Sep 07 '21 at 09:49
  • whichever node hosts the driver. I understand one way would be to use client mode so master node is fully dedicated to the driver. but is it possible to achieve that in cluster mode? – nit Sep 09 '21 at 14:44
  • No `Cluster` mode means you want your master node to be free all the time. And let's say you have thousands of jobs, in that case if all of the job's drivers gets created on master node only. So you will see delay in starting the subsequent jobs. And you can't scale the master node. You can read more on the difference here https://stackoverflow.com/q/41124428/7857701 – Snigdhajyoti Sep 09 '21 at 19:11

0 Answers0