pyspark error does not exist in the jvm error when initializing SparkContext

Question

I am using spark over emr and writing a pyspark script, I am getting an error when trying to

from pyspark import SparkContext
sc = SparkContext()

this is the error

File "pyex.py", line 5, in <module>
    sc = SparkContext()   File "/usr/local/lib/python3.4/site-packages/pyspark/context.py", line 118, in __init__
    conf, jsc, profiler_cls)   File "/usr/local/lib/python3.4/site-packages/pyspark/context.py", line 195, in _do_init
    self._encryption_enabled = self._jvm.PythonUtils.getEncryptionEnabled(self._jsc)   File "/usr/local/lib/python3.4/site-packages/py4j/java_gateway.py", line 1487, in __getattr__
    "{0}.{1} does not exist in the JVM".format(self._fqn, name)) py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

I found this answer stating that I need to import sparkcontext but this is not working also.

Did you close the SparkContext? Also, can you show the full code? — pvy4917, Nov 05 '18 at 21:04
this is happening before I get the chance to use it. I am creating it and get the error. — thebeancounter, Nov 05 '18 at 21:05
It does not work, all of the emr setup is made via environment variables, and is not in the conf. — thebeancounter, Nov 05 '18 at 21:38

score 33 · Accepted Answer · edited Nov 28 '18 at 13:40

33

PySpark recently released 2.4.0, but there's no stable release for spark coinciding with this new version. Try downgrading to pyspark 2.3.2, this fixed it for me

Edit: to be more clear your PySpark version needs to be the same as the Apache Spark version that is downloaded, or you may run into compatibility issues

Check the version of pyspark by using

pip freeze

edited Nov 28 '18 at 13:40

Rob

3,418
1
19
27

answered Nov 07 '18 at 06:01

svw

444
3
6

3

For which version have they even released PySpark 2.4.0 for then? – shubhamgoel27 Nov 16 '18 at 07:22
when i made this post, https://spark.apache.org/downloads.html did not have 2.4.0 available for download, only 2.3.2. As long as the pyspark version == apache sparks you should be good. I will update the post – svw Nov 17 '18 at 14:37
This confuses me. When I ```pip install pyspark==2.4.0``` or any version for that matter, it installs a version of Spark alongside it in my site-libs. My use case is trying to use the KafkaUtils in the streaming package, WITHOUT installing a local Spark. Doing that still causes py4j gateway errors trying to load the class. How is it the Spark version that ships with the python pyspark installation breaks trying to use it without anything else involved? – Penumbra Mar 01 '19 at 20:41

Роберт Воропаев · Answer 2 · 2020-01-09T12:10:22.037

10

You need to set the following environments to set the Spark path and the Py4j path.
For example in ~/.bashrc:

export SPARK_HOME=/home/hadoop/spark-2.1.0-bin-hadoop2.7
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
export PATH=$SPARK_HOME/bin:$SPARK_HOME/python:$PATH

And use findspark at the top of the your file:

import findspark
findspark.init()

edited Jan 09 '20 at 12:10

answered Jan 09 '20 at 11:42

Роберт Воропаев

101
1
3

score 9 · Answer 3 · answered Nov 06 '18 at 16:06

9

I just had a fresh pyspark installation on my Windows device and was having the exact same issue. What seems to have helped is the following:

Go to your System Environment Variables and add PYTHONPATH to it with the following value: %SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-<version>-src.zip:%PYTHONPATH%, just check what py4j version you have in your spark/python/lib folder.

The reason why I think this works is because when I installed pyspark using conda, it also downloaded a py4j version which may not be compatible with the specific version of spark, so it seems to package its own version.

answered Nov 06 '18 at 16:06

mugurkt

101
3

1

That's right @mugurkt. Guided by your answer, I had to removed pyspark that came with an incompatible py4j version (initially installed via anaconda navigator) & then reinstalled the pyspark with 'pip install pyspark' via the command line prompt & it now works well for me. – Laenka-Oss Feb 02 '20 at 15:37
2

Of course, for those using *nix, this translates to: export PYTHONPATH=$SPARK_HOME\python:$SPARK_HOME\python\lib\py4j--src.zip:$PYTHONPATH – Metropolis Aug 06 '20 at 23:06

score 3 · Answer 4 · answered Feb 01 '19 at 10:57

Instead of editing the Environment Variables, you might just ensure that the Python environment (the one with pyspark) also has the same py4j version as the zip file present in the \python\lib\ dictionary within you Spark folder. E.g., d:\Programs\Spark\python\lib\py4j-0.10.7-src.zip on my system, for Spark 2.3.2. It's the py4j version shipped as part of the Spark archive file.

score 3 · Answer 5 · answered Feb 26 '19 at 08:55

3

Try adding this at the top of the file:

import findspark
findspark.init()

See https://github.com/minrk/findspark

answered Feb 26 '19 at 08:55

fstang

5,607
4
25
26

import findspark does not exist in python 3.7, can you recheck i am trying using import option using anaconda navigator – Abhishek Nov 25 '19 at 13:59

score 3 · Answer 6 · answered May 16 '19 at 08:10

Just to make it simple,It's all about python and java couldn't talk because the medium the have to speak out (py4j) are different, that's it.I had same issue and all those above answers are valid and will work if you use them correctly, It's either you define a system variable to tell both which py4j they should use, or you can make some un-installation and installation back so that everyone will be on same page.

score 1 · Answer 7 · answered Nov 21 '18 at 18:30

1

Use SparkContext().stop() at the end of the program to stop this situation.

answered Nov 21 '18 at 18:30

abhishek kumar

339
3
11

score 1 · Answer 8 · answered Dec 22 '18 at 08:05

The following steps solved my issue: - Downgrading it to 2.3.2 - adding PYTHONPATH as System Environment Variable with value %SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-<version>-src.zip:%PYTHONPATH% Note: use proper version in the value given above, don't copy exactly.

score 1 · Answer 9 · edited Mar 14 '22 at 05:38

1

when I download the new version pip install from the anaconda command prompt, i get the same issue.

when I use top of the code file:

import findspark
findspark.init("c:\spark")

this code solved my problem.

edited Mar 14 '22 at 05:38

Purushothaman Srikanth

711
4
9
22

answered May 17 '20 at 01:11

Soner Çakal

11
1

score 1 · Answer 10 · answered Jul 27 '20 at 14:11

1

Try to install spark 2.4.5 version, and set spark home path to this version. Even I faced the issue after changing the version, it got resolved for me.

answered Jul 27 '20 at 14:11

BhavyaPrabha

349
2
3

pyspark error does not exist in the jvm error when initializing SparkContext

10 Answers10

Linked