0

I created a docker image with spark 3.0.0 that is to be used for executing pyspark from a jupyter notebook. The issue I'm having though, when running the docker image locally and testing the following script:

import os
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession

print("*** START ***")

sparkConf = SparkConf()

sc = SparkContext(conf=sparkConf)

rdd = sc.parallelize(range(100000000))
print(rdd.sum())

print("*** DONE ***")

I get the following error:

Traceback (most recent call last):
  File "test.py", line 9, in <module>
    sc = SparkContext(conf=sparkConf)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/context.py", line 136, in __init__
    conf, jsc, profiler_cls)
  File "/usr/local/lib/python3.7/dist-packages/pyspark/context.py", line 213, in _do_init
    self._encryption_enabled = self._jvm.PythonUtils.getEncryptionEnabled(self._jsc)
  File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1487, in __getattr__
    "{0}.{1} does not exist in the JVM".format(self._fqn, name))
py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

I've tried using findspark and pip installing py4j fresh on the image, but nothing is working and I can't seem to find any answers other than using findspark. Has anyone else been able to solve this issue using spark 3.0.0?

JMV12
  • 965
  • 1
  • 20
  • 52

1 Answers1

0

Probably your are mixing different version of Pyspark and Spark

Check my see my complete answer here: https://stackoverflow.com/a/66927923/14954327

asiera
  • 492
  • 5
  • 12