0

Normally, we are running our PySpark code on our Spark cluster. This code is tested by automated tests. We run unit and integration tests locally and (in the pipeline) on the Spark cluster. The configuration is just a bit different in both situations. Therefore, I need to know if a test runs local or on the cluster.

How can I programmatically detect if a test is running on a local machine or on a Spark cluster?

AutomatedChaos
  • 7,267
  • 2
  • 27
  • 47
  • 1
    Hey, have you checked [this](https://stackoverflow.com/questions/1854/what-os-am-i-running-on) ? This might help – Maël Pedretti Apr 10 '20 at 10:09
  • I think you're asking an XY problem question. What are you actually trying to do with your tests? You should probably be configuring spark as part of your tests rather than trying to detect anything. – Marcin Apr 10 '20 at 15:07

1 Answers1

1

spark.submit.deployMode - the deploy mode of Spark driver program, either "client" or "cluster", Which means to launch driver program locally ("client") or remotely ("cluster") on one of the nodes inside the cluster.

You can also check the deployment mode by visiting the Web UI. Spark exposes three of these UIs: Master web UI, Worker web UI, Application web UI.

To check the UI visit (by default / can change in case of YARN deployment etc): http://localhost:4040/api/v1/applications or http://10.0.2.15:4040.

What you need is the Environment tab and you might want to scrape it or use the REST API if you don't want to deal with SparkListeners:

The Environment tab displays the values for the different environment and configuration variables, including JVM, Spark, and system properties.

See more: https://spark.apache.org/docs/latest/monitoring.html

enter image description here

Anna Taylor
  • 417
  • 4
  • 10