py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

Question

I am currently on JRE: 1.8.0_181, Python: 3.6.4, spark: 2.3.2

I am trying to execute following code in Python:

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('Basics').getOrCreate()

This fails with following error:

spark = SparkSession.builder.appName('Basics').getOrCreate() Traceback (most recent call last): File "", line 1, in File "C:\Tools\Anaconda3\lib\site-packages\pyspark\sql\session.py", line 173, in getOrCreate sc = SparkContext.getOrCreate(sparkConf) File "C:\Tools\Anaconda3\lib\site-packages\pyspark\context.py", line 349, in getOrCreate SparkContext(conf=conf or SparkConf()) File "C:\Tools\Anaconda3\lib\site-packages\pyspark\context.py", line 118, in init conf, jsc, profiler_cls) File "C:\Tools\Anaconda3\lib\site-packages\pyspark\context.py", line 195, in _do_init self._encryption_enabled = self._jvm.PythonUtils.getEncryptionEnabled(self._jsc) File "C:\Tools\Anaconda3\lib\site-packages\py4j\java_gateway.py", line 1487, in getattr "{0}.{1} does not exist in the JVM".format(self._fqn, name)) py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

Any one has any idea on what can be a potential issue here?

Appreciate any help or feedback here. Thank you!

https://stackoverflow.com/questions/53161939/pyspark-error-does-not-exist-in-the-jvm-error-when-initializing-sparkcontext — pvy4917, Nov 12 '18 at 16:04
You can try this: https://stackoverflow.com/a/54881624/1316649 It worked for me. — fstang, Feb 26 '19 at 08:58

score 37 · Answer 1 · answered Jun 20 '20 at 14:11

37

Using findspark is expected to solve the problem:

Install findspark

$pip install findspark

In you code use:

import findspark
findspark.init()

Optionally you can specify "/path/to/spark" in the init method above; findspark.init("/path/to/spark")

answered Jun 20 '20 at 14:11

sm7

599
5
9

3

Yes, you can `findspark.init()` before you import pyspark – MonkandMonkey Aug 09 '20 at 02:38
1

`findspark.init()` must be called before importing `pyspark`. – Andrew Nguonly Jul 05 '22 at 14:42

score 30 · Answer 2 · edited Feb 24 '19 at 17:52

30

As outlined @ pyspark error does not exist in the jvm error when initializing SparkContext, adding PYTHONPATH environment variable (with value as:

%SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-<version>-src.zip:%PYTHONPATH%,
- just check what py4j version you have in your spark/python/lib folder) helped resolve this issue.

edited Feb 24 '19 at 17:52

SherylHohman

16,580
17
88
94

answered Nov 13 '18 at 00:35

bvkclear

861
1
7
6

1

Are we for certain supposed to include a semicolon after `%SPARK_HOME%\python`? – frlzjosh May 28 '20 at 11:16
4

I have followed the same step above, it worked for me. Just make sure that your spark version downloaded is the same as the one installed using pip command. Once this path was set, just restart your system. – nvsk. avinash Jul 17 '20 at 05:09
1

I had to put the slashes in the other direction for it to work, but that did the trick. eg. PYTHONPATH=/opt/spark/python;/opt/spark/python/lib/py4j-0.10.9-src.zip:%$ – Julien Massardier Aug 20 '20 at 09:26
I can confirm that this solved the issue for me on WSL2 Ubuntu. – Raghav Mar 27 '21 at 02:53
I first followed the same step above, and I still got the same error. The root cause for my case is that my local py4j version is different than the one in spark/python/lib folder. I try to pip install the same version as my local one, and check the step above, it worked for me. Thanks. – chengji18 Dec 23 '21 at 00:10

mounirboulwafa · Answer 3 · 2021-07-13T15:26:16.830

Solution #1. Check your environment variables

You are getting “py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM” due to environemnt variable are not set right.

Check if you have your environment variables set right on .bashrc file. For Unix and Mac, the variable should be something like below. You can find the .bashrc file on your home path.

Note: Do not copy and paste the below line as your Spark version might be different from the one mentioned below.

export SPARK_HOME=/opt/spark-3.0.0-bin-hadoop2.7
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH
export PATH=$SPARK_HOME/bin:$SPARK_HOME/python:$PATH

If you are running on windows, open the environment variables window, and add/update below.

SPARK_HOME  =>  /opt/spark-3.0.0-bin-hadoop2.7
PYTHONPATH  =>  %SPARK_HOME%/python;%SPARK_HOME%/python/lib/py4j-0.10.9-src.zip;%PYTHONPATH%
PATH  => %SPARK_HOME%/bin;%SPARK_HOME%/python;%PATH%

After setting the environment variables, restart your tool or command prompt.

Solution #2. Using findspark

Install findspark package by running $pip install findspark and add the following lines to your pyspark program

import findspark
findspark.init() 
# you can also pass spark home path to init() method like below
# findspark.init("/path/to/spark")

Solution #3. Copying the pyspark and py4j modules to Anaconda lib

Sometimes after changing/upgrading Spark version, you may get this error due to version incompatible between pyspark version and pyspark available at anaconda lib. In order to correct it

Note: copy the specified folder from inside the zip files and make sure you have environment variables set right as mentioned in the beginning.

Copy the py4j folder from :

C:\apps\opt\spark-3.0.0-bin-hadoop2.7\python\lib\py4j-0.10.9-src.zip\

to

C:\Programdata\anaconda3\Lib\site-packages\.

And, copy pyspark folder from :

C:\apps\opt\spark-3.0.0-bin-hadoop2.7\python\lib\pyspark.zip\

to

C:\Programdata\anaconda3\Lib\site-packages\

Sometimes, you may need to restart your system in order to effect eh environment variables.

Credits to : https://sparkbyexamples.com/pyspark/pyspark-py4j-protocol-py4jerror-org-apache-spark-api-python-pythonutils-jvm/

score 9 · Answer 4 · answered Mar 31 '21 at 15:09

9

you just need to install an older version of pyspark .This version works"pip install pyspark==2.4.7"

answered Mar 31 '21 at 15:09

Abdellatif GOU-ALI

151
2
2

3

This was helpful! PySpark version needed to match the Spark version. – menuka May 11 '21 at 09:36

score 4 · Answer 5 · answered Feb 06 '20 at 09:04

Had the same problem, on Windows, and I found that my Python had different versions of py4j and pyspark than the spark expected. Solved by copying the python modules inside the zips: py4j-0.10.8.1-src.zip and pyspark.zip (found in spark-3.0.0-preview2-bin-hadoop2.7\python\lib) into C:\Anaconda3\Lib\site-packages.

score 4 · Answer 6 · answered Jan 20 '21 at 11:52

4

I had the same problem. In my case with spark 2.4.6, installing pyspark 2.4.6 or 2.4.x, the same version as spark, fixed the problem since pyspark 3.0.1(pip install pyspark will install latest version) raised the problem.

answered Jan 20 '21 at 11:52

Spaceship222

759
10
20

score 3 · Answer 7 · answered Jun 13 '20 at 05:59

3

I recently faced this issue.
mistake was - I was opening normal jupyter notebook.
Always open Anaconda Prompt -> type 'pyspark' -> It will automatically open Jupyter notebook for you.
After that, you will not get this error.

answered Jun 13 '20 at 05:59

lil-wolf

372
2
15

score 2 · Answer 8 · answered Jan 20 '20 at 05:33

2

if use pycharm - Download spark 2.4.4
- settings/project structure/addcontent root/ add py4j.0.10.8.1.zip ve pyspark.zip in spark.2.4.4/python/lib

answered Jan 20 '20 at 05:33

Zendem

490
5
8

score 2 · Answer 9 · answered May 22 '21 at 11:41

This may happen if you have pip installed pyspark 3.1 and your local spark is 2.4 (I mean versions incompatibility) In my case, to overcome this, I uninstalled spark 3.1 and switched to pip install pyspark 2.4.

My advice here is check for version incompatibility issues too along with other answers here.

score 2 · Answer 10 · answered Nov 09 '21 at 20:27

2

If not already clear from previous answers, your pyspark package version has to be the same as Apache Spark version installed.

For example I use Ubuntu and PySpark 3.2. In the environment variable (bashrc):

export SPARK_HOME="/home/ali/spark-3.2.0-bin-hadoop3.2"
export PYTHON_PATH=$SPARK_HOME/python:$PYTHON_PATH

answered Nov 09 '21 at 20:27

ali shibli

57
1
7

Thanks. I was using spark 3.2.1 so ```pip install pyspark==3.2.1``` solved similar issue. – yjsa Aug 02 '22 at 06:10

score 1 · Answer 11 · answered Apr 03 '21 at 05:11

If you Updated pyspark or spark

If like me the problem occurred after you updated one of the two and you didn't know that Pyspark and Spark version need to match, as the Pyspark PyPi repo says:

NOTE: If you are using this with a Spark standalone cluster you must ensure that the version (including minor version) matches or you may experience odd errors.

Therefor upgrading/downgrading Pyspark/Spark for their version to match solve the issue.

To upgrade Spark follow: https://sparkbyexamples.com/pyspark/pyspark-py4j-protocol-py4jerror-org-apache-spark-api-python-pythonutils-jvm/

score 0 · Answer 12 · answered Sep 30 '21 at 06:00

If using Spark with the AWS Glue libs locally (https://github.com/awslabs/aws-glue-libs), ensure that Spark, PySpark and the version of AWS Glue libs all align correctly. As of now, the current valid combinations are:

aws-glue-libs branch	Glue Version	Spark Version
glue-0.9	0.9	2.2.1
glue-1.0	1.0	2.4.3
glue-2.0	2.0	2.4.3
master	3.0	3.1.1

score 0 · Answer 13 · answered Oct 31 '22 at 15:29

0

Regarding previously mentioned solution with findspark, remember that it must be at the top of your script:

import sys
import findspark
findspark.init()
from...
import...

answered Oct 31 '22 at 15:29

marcin2x4

1,321
2
18
44

py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

13 Answers13

If you Updated pyspark or spark

Linked