0

I have a Spark 2 application that uses grpc so that client applications can connect to it.

However, I want the grpc code only to be started on the driver node and not on the workers.

Is there a possibility in Spark 2 to check if the node the code is currently running on is the driver node?

jack
  • 1,787
  • 14
  • 30
navige
  • 2,447
  • 3
  • 27
  • 53

3 Answers3

5

I don't like the way to do it using "hosts", you depend on matching the right interface and also, same node can contain both drivers and masters. Personally, I set a environment variable

spark.executorEnv.RUNNING_ON_EXECUTOR=yes

and then in my code (using Python here, but it should work in any other language):

import os
if "RUNNING_ON_EXECUTOR" in os.environ:
       //Run executor code
else:
       //run driver code
BiS
  • 501
  • 4
  • 17
1

You can get the driver hostname by:

sc.getConf.get("spark.driver.host")
jack
  • 1,787
  • 14
  • 30
Assaf Mendelson
  • 12,701
  • 5
  • 47
  • 56
  • so checking for: `java.net.InetAddress.getLocalHost().getHostName() == mainContext.sc.getConf.get("spark.driver.host")` should allow me to do the check I guess – navige Feb 13 '17 at 10:47
  • 1
    rather than checking equality, see http://stackoverflow.com/questions/2406341/how-to-check-if-an-ip-address-is-the-local-host-on-a-multi-homed-system – navige Feb 13 '17 at 15:59
  • I don't think this is safe at all, if you run your code in local mode or yarn-cluster, your executor can land in the same position than your drivers. – BiS Sep 14 '17 at 07:42
0

We can tell a node is the driver when TaskContext.get() is None normally, unless TaskContext is created explicitly in the driver for testing purposes.

GraceMeng
  • 949
  • 8
  • 6