0

I used pip install pyspark in a python enviroment, java is installed but when i try to initialise a spark session I get a java error Java gateway process exited before sending its port number

spark = SparkSession \
    .builder \
    .appName("CustomerChurn") \
    .master("local") \
    .config() \
    .getOrCreate()





RuntimeError                              Traceback (most recent call last)
Input In [3], in <cell line: 3>()
      1 findspark.init()
      3 spark = SparkSession \
      4     .builder \
      5     .appName("CustomerChurn") \
      6     .master("local") \
      7     .config() \
----> 8     .getOrCreate()

File ~\anaconda3\envs\CustomerChurnProject\lib\site-packages\pyspark\sql\session.py:269, in SparkSession.Builder.getOrCreate(self)
    267     sparkConf.set(key, value)
    268 # This SparkContext may be an existing one.
--> 269 sc = SparkContext.getOrCreate(sparkConf)
    270 # Do not update `SparkConf` for existing `SparkContext`, as it's shared
    271 # by all sessions.
    272 session = SparkSession(sc, options=self._options)

File ~\anaconda3\envs\CustomerChurnProject\lib\site-packages\pyspark\context.py:483, in SparkContext.getOrCreate(cls, conf)
    481 with SparkContext._lock:
    482     if SparkContext._active_spark_context is None:
--> 483         SparkContext(conf=conf or SparkConf())
    484     assert SparkContext._active_spark_context is not None
    485     return SparkContext._active_spark_context

File ~\anaconda3\envs\CustomerChurnProject\lib\site-packages\pyspark\context.py:195, in SparkContext.__init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls, udf_profiler_cls)
    189 if gateway is not None and gateway.gateway_parameters.auth_token is None:
    190     raise ValueError(
    191         "You are trying to pass an insecure Py4j gateway to Spark. This"
    192         " is not allowed as it is a security risk."
    193     )
--> 195 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    196 try:
    197     self._do_init(
    198         master,
    199         appName,
   (...)
    208         udf_profiler_cls,
    209     )

File ~\anaconda3\envs\CustomerChurnProject\lib\site-packages\pyspark\context.py:417, in SparkContext._ensure_initialized(cls, instance, gateway, conf)
    415 with SparkContext._lock:
    416     if not SparkContext._gateway:
--> 417         SparkContext._gateway = gateway or launch_gateway(conf)
    418         SparkContext._jvm = SparkContext._gateway.jvm
    420     if instance:

File ~\anaconda3\envs\CustomerChurnProject\lib\site-packages\pyspark\java_gateway.py:106, in launch_gateway(conf, popen_kwargs)
    103     time.sleep(0.1)
    105 if not os.path.isfile(conn_info_file):
--> 106     raise RuntimeError("Java gateway process exited before sending its port number")
    108 with open(conn_info_file, "rb") as info:
    109     gateway_port = read_int(info)

RuntimeError: Java gateway process exited before sending its port number

the run time error is posted above, I have not seen this type of error in other posts

Paul Corcoran
  • 113
  • 1
  • 9
  • You'd better write your error stack trace to let us know more precicely – AMK Jun 26 '22 at 16:03
  • Take a look at https://stackoverflow.com/questions/31841509/pyspark-exception-java-gateway-process-exited-before-sending-the-driver-its-po?page=2&tab=modifieddesc#tab-top – AMK Jun 26 '22 at 16:04
  • Yes, I realised without the trace back it was very vague, I have added that now – Paul Corcoran Jun 26 '22 at 16:06
  • Based on your error I think this link will help you: https://sparkbyexamples.com/pyspark/pyspark-exception-java-gateway-process-exited-before-sending-the-driver-its-port-number/ – AMK Jun 26 '22 at 16:07
  • Have you set $JAVA_HOME and PYSPARK_SUBMIT_ARGS? – AMK Jun 26 '22 at 16:08
  • I am on windows, I have previously gotten it setup on a virtual linux machine but this time i need to use my windows. – Paul Corcoran Jun 26 '22 at 16:14
  • See this: https://confluence.atlassian.com/doc/setting-the-java_home-variable-in-windows-8895.html – AMK Jun 26 '22 at 16:20
  • If it solves your problem, let me know to post it as an answer – AMK Jun 26 '22 at 16:20
  • it has fixed the initial problem yes, now running into Py4JJavaError: An error occurred while calling o8.set., it looks like a java compatibility issue – Paul Corcoran Jun 26 '22 at 16:31
  • It looks like an incompatibility issue between your PySpark version and your Python version. You'd better write them down in your question too. See this: https://stackoverflow.com/questions/41840296/pyspark-in-ipython-notebook-raises-py4jjavaerror-when-using-count-and-first – AMK Jun 26 '22 at 16:36
  • thank you, it probably needs a fresh question where I specify all my versions/environmenbts, Appreciate your time! – Paul Corcoran Jun 26 '22 at 16:43
  • Yes, I think it needs to be another question. I'll write down my comments as an answer. – AMK Jun 26 '22 at 16:46
  • Successfully installed py4j-0.10.9, i had 0.10.9.5 installed which was causing my new errors. long day but thankfullly found out. – Paul Corcoran Jun 26 '22 at 21:21

1 Answers1

1

Based on your error logs, I think you need to specify the $JAVA_HOME variable on your system.

This link may help:

https://sparkbyexamples.com/pyspark/pyspark-exception-java-gateway-process-exited-before-sending-the-driver-its-port-number/

In Linux:

export JAVA_HOME=(Path to the JDK, e.x: /usr/lib/jvm/java-11-openjdk-amd64)

And after that, you need to save it in your ~/.bashrc (If you use bash)

vi ~/.bashrc
export JAVA_HOME=(Path to the JDK, e.x: /usr/lib/jvm/java-11-openjdk-amd64)

Then:

source ~/.bashrc

(You can see the above link)

In windows:

Go to the edit system environment window on your My Computer.

enter image description here enter image description here

See this: https://confluence.atlassian.com/doc/setting-the-java_home-variable-in-windows-8895.html

AMK
  • 662
  • 6
  • 16