6

I am trying to create SparkContext in jupyter notebook but I am getting following Error:

Py4JError: org.apache.spark.api.python.PythonUtils.getPythonAuthSocketTimeout does not exist in the JVM

Here is my code

from pyspark import SparkContext, SparkConf
conf = SparkConf().setMaster("local").setAppName("Groceries")
sc = SparkContext(conf = conf)


Py4JError                                 Traceback (most recent call last)
<ipython-input-20-5058f350f58a> in <module>
      1 conf = SparkConf().setMaster("local").setAppName("My App")
----> 2 sc = SparkContext(conf = conf)

~/Documents/python38env/lib/python3.8/site-packages/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    144         SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    145         try:
--> 146             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
    147                           conf, jsc, profiler_cls)
    148         except:

~/Documents/python38env/lib/python3.8/site-packages/pyspark/context.py in _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, jsc, profiler_cls)
    224         self._encryption_enabled = self._jvm.PythonUtils.isEncryptionEnabled(self._jsc)
    225         os.environ["SPARK_AUTH_SOCKET_TIMEOUT"] = \
--> 226             str(self._jvm.PythonUtils.getPythonAuthSocketTimeout(self._jsc))
    227         os.environ["SPARK_BUFFER_SIZE"] = \
    228             str(self._jvm.PythonUtils.getSparkBufferSize(self._jsc))

~/Documents/python38env/lib/python3.8/site-packages/py4j/java_gateway.py in __getattr__(self, name)
   1528                     answer, self._gateway_client, self._fqn, name)
   1529         else:
-> 1530             raise Py4JError(
   1531                 "{0}.{1} does not exist in the JVM".format(self._fqn, name))
   1532 

Py4JError: org.apache.spark.api.python.PythonUtils.getPythonAuthSocketTimeout does not exist in the JVM
Gubberex
  • 61
  • 1
  • 1
  • 3

3 Answers3

10

Python's pyspark and spark cluster versions are inconsistent and this error is reported. Uninstall the version that is consistent with the current pyspark, then install the same version as the spark cluster. My spark version is 3.0.2 and run the following code:

pip3 uninstall pyspark
pip3 install pyspark==3.0.2
Eric Aya
  • 69,473
  • 35
  • 181
  • 253
JustDoIt
  • 141
  • 7
  • 4
    Had this issue in PyCharm, and after downgrading my 'pyspark' package to version 3.0.0 to match my version of Spark 3.0.0-preview2, exception went away. – hipokito Jun 25 '21 at 14:16
2

We need to uninstall the default/exsisting/latest version of PySpark from PyCharm/Jupyter Notebook or any tool that we use.

Then check the version of Spark that we have installed in PyCharm/ Jupyter Notebook / CMD. Using the command spark-submit --version (In CMD/Terminal).

Then Install PySpark which matches the version of Spark that you have. For example, I have Spark 3.0.3, so I have installed PySpark 3.0.3

In CMD/PyCharm Terminal,

pip install pyspark=3.0.3

Or check this if you are a PyCharm user.

0

I have had the same error today and resolved it with the below code:

Execute this in a separate cell before you have your spark session builder

    from pyspark import SparkContext,SQLContext,SparkConf,StorageLevel
    from pyspark.sql import SparkSession
    from pyspark.conf import SparkConf
    SparkSession.builder.config(conf=SparkConf())