0

I am submitting Spark applications to my Hadoop 3 nodes cluster.

the applicationMaster is always (client or cluster mode) hosted on the client machine.

Thanks for clarifying this.

  • A binding error shouldnt happen if the ports are available... – OneCricketeer Nov 11 '21 at 20:23
  • As long as the networking allows it, I can't imagine it would be an error. You'd have the driver and AppMaster JVM competing for resources, though. – OneCricketeer Nov 11 '21 at 20:26
  • You might be interested in the `spark.port.maxRetries` setting, as mentioned in a post here that I wrote after debugging my own networking issues. https://stackoverflow.com/a/56486271/2308683 – OneCricketeer Nov 11 '21 at 20:31
  • Right, so that `16` attempts is from `spark.port.maxRetries`. For example, if you update it to 32, it'll try to bind twice as many ports – OneCricketeer Nov 11 '21 at 21:14
  • I don't have experience with these specific settings. It shouldn't affect your actual code from executing, though – OneCricketeer Nov 12 '21 at 07:10

2 Answers2

0

If this is just for you and this works, go for it.

You aren't following a "Normal" spark strategy for a yarn cluster. Is that 'OK'? If you have a good reason, yes it's ok.

Would I use this in production? No.

Are there simpler more common ways of running a cluster? Yes.

You are mixing strategies of running Spark Standalone and Yarn. These are two fundamentally different architectures. If you can make the two architectures work together that's fun. But you may hit some weird problems and as this is a custom set of settings you may not find a lot of support to help you.

Matt Andruff
  • 4,974
  • 1
  • 5
  • 21
0

No, It's not "OK".

One of the ideologies behind spark is resilience. If you are forcing 1 node to be the application master you are introducing a bottleneck & a single point of failure. You are using yarn, there is no reason to specify a master.

Matt Andruff
  • 4,974
  • 1
  • 5
  • 21
  • 1
    I have added two conflict answer instead of saying "It Depends". I'll let the community decide which is more relevant. – Matt Andruff Nov 15 '21 at 15:53