1

Pyspark shell initiates a Java gateway using Py4J then talk to it and send the python SparkContext to Java gateway.

However, How can I know which port Spark Context open? How could PySpark decide which port to use to create Java gateway Spark Context?

Additional question:

  1. Who start Py4j java process?
cdhit
  • 1,384
  • 1
  • 15
  • 38

2 Answers2

2

Maybe PySpark is using the default ports, see Py4J docs for details https://www.py4j.org/faq.html#what-ports-are-used-by-py4j.

aristide-n
  • 198
  • 1
  • 8
0

The port is chosen randomly from the available ports in the driver. pyspark launches the spark java process with a name of a temporary file as a parameter, the java process writes the port and auth_token to the temporary file. Python reads the temporary file and creates a py4j gateway. you can access the py4j gateway in sc._gateway and read the port from sc._gateway.gateway_parameters.port.