5

I have installed PySpark on windows and was having no problem till yesterday. I am using windows 10, PySpark version 2.3.3(Pre-build version), java version "1.8.0_201". Yesterday when I tried creating a spark session, I ran into below error.

Exception                                 Traceback (most recent call last)
<ipython-input-2-a9ef4ac1a07d> in <module>
----> 1 spark = SparkSession.builder.appName("Hello").master("local").getOrCreate()

C:\spark-2.3.3-bin-hadoop2.7\python\pyspark\sql\session.py in getOrCreate(self)
    171                     for key, value in self._options.items():
    172                         sparkConf.set(key, value)
--> 173                     sc = SparkContext.getOrCreate(sparkConf)
    174                     # This SparkContext may be an existing one.
    175                     for key, value in self._options.items():

C:\spark-2.3.3-bin-hadoop2.7\python\pyspark\context.py in getOrCreate(cls, conf)
    361         with SparkContext._lock:
    362             if SparkContext._active_spark_context is None:
--> 363                 SparkContext(conf=conf or SparkConf())
    364             return SparkContext._active_spark_context
    365 

C:\spark-2.3.3-bin-hadoop2.7\python\pyspark\context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    127                     " note this option will be removed in Spark 3.0")
    128 
--> 129         SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    130         try:
    131             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

C:\spark-2.3.3-bin-hadoop2.7\python\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
    310         with SparkContext._lock:
    311             if not SparkContext._gateway:
--> 312                 SparkContext._gateway = gateway or launch_gateway(conf)
    313                 SparkContext._jvm = SparkContext._gateway.jvm
    314 

C:\spark-2.3.3-bin-hadoop2.7\python\pyspark\java_gateway.py in launch_gateway(conf)
     44     :return: a JVM gateway
     45     """
---> 46     return _launch_gateway(conf)
     47 
     48 

C:\spark-2.3.3-bin-hadoop2.7\python\pyspark\java_gateway.py in _launch_gateway(conf, insecure)
    106 
    107             if not os.path.isfile(conn_info_file):
--> 108                 raise Exception("Java gateway process exited before sending its port number")
    109 
    110             with open(conn_info_file, "rb") as info:

Exception: Java gateway process exited before sending its port number

I did check out the pyspark issues on github as well as stackoverflow answers realted to the same but the issue is not resolved.

I did try out the below methods:

1.) Tried uninstalling, installing and Changing the java installation directory. Currently, my java installation directory is C:/Java/ . Pyspark: Exception: Java gateway process exited before sending the driver its port number

2.) Tried setting PYSPARK_SUBMIT_ARGS, but of no help.

Please suggest me the possible resolutions.

Sanchit Kumar
  • 1,545
  • 1
  • 11
  • 19
  • Did you added winutil.exe ? https://wiki.apache.org/hadoop/WindowsProblems – maogautam Mar 30 '19 at 07:39
  • Yes, I have `winutils.exe` inside folder `C:\Hadoop\bin` and my `HADOOP_HOME = C:\Hadoop` – Sanchit Kumar Mar 30 '19 at 15:48
  • could you please check this https://stackoverflow.com/questions/49641137/installing-pyspark-on-windows – maogautam Apr 03 '19 at 19:46
  • Yah I did check the link....the problem I am facing is not regarding the installation...I am able to install and use it.....the problem is after using pyspark for few days I suddenly bump into the above error and I am not able to figure out how to solve it.....I am not able to create a new SparkSession or SparkContext now. – Sanchit Kumar Apr 04 '19 at 20:18
  • is `JAVA_HOME` set ? – Omar Apr 05 '19 at 17:19
  • Yes, I have `JAVA_HOME` set correctly. I have other apps which use `JAVA_HOME`, they are working fine. – Sanchit Kumar Apr 05 '19 at 20:04

3 Answers3

2

I think you need to uninstall java and pyspark both again and then reinstall java and pyspark.

pip install pyspark

Then Go to system > advance system setting > environment variables > then edit java home in user variables > Path & system variable > Path. enter image description here

  • I can uninstall and install both pyspark and java and then it should be working fine. But that is not the solution to the problem, because it might again occur in the future. And this is the second time I am facing this issue. I did install and uninstall the first time. – Sanchit Kumar Apr 08 '19 at 20:22
1

Please ensure that the JAVA_HOME environment variable should not contain any spaces otherwise it might throw the error , i removed the same and it worked like a charm for me . Here is a short code to check your JAVA_HOME in python

import os print(os.environ['JAVA_HOME'])

0

After going through the code that is resulting in the error, i see these might be the issues.

  1. Check if there is a environmental variable of TEMP defined in your system. enter image description here

If not, define one.

  1. If TEMP is defined, make sure that folder is "really" existing and has full access.

Basically, The code that is raising an exception is looking for a folder to create temporary files on your system. You must make sure they are present.

j raj
  • 167
  • 1
  • 2
  • 9