0

Where does the driver and application master run on EMR 6.9 with boto3.client('emr').run_job_flow(...) in regards to MASTER/CORE/TASK nodes?

This question is not in regards to ssh'ing into the master node and executing spark-submit as described in this blog by aws. I think that is clear which process runs where.

AWS documentation, probably for good reason, says the same thing that Spark say about where the driver and application master run in both client and cluster mode. EMR's default master is yarn so this answer is accurate about how it works

  • Client mode, driver will be running in the machine where application got submitted and the machine has to be available in the network till the application completes.
  • Cluster mode, driver will be running in application master(one per spark application) node and machine submitting the application need not to be in network after submission

Okay but I am submitting via boto3 api so what is the master node where the driver and AM reside? I would have thought so but this documentation by aws to me makes it sound like the AM could be run on the CORE or the TASK nodes in +6.X.

What I trying to understand by this question is I have a on demand MASTER node that is okay size and Spot TASK nodes that are really small. If either driver and AM are running on the TASK node I would upgrade that instance.

vfrank66
  • 1,318
  • 19
  • 28

0 Answers0