47

I installed Spark, ran the sbt assembly, and can open bin/pyspark with no problem. However, I am running into problems loading the pyspark module into ipython. I'm getting the following error:

In [1]: import pyspark
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-c15ae3402d12> in <module>()
----> 1 import pyspark

/usr/local/spark/python/pyspark/__init__.py in <module>()
     61
     62 from pyspark.conf import SparkConf
---> 63 from pyspark.context import SparkContext
     64 from pyspark.sql import SQLContext
     65 from pyspark.rdd import RDD

/usr/local/spark/python/pyspark/context.py in <module>()
     28 from pyspark.conf import SparkConf
     29 from pyspark.files import SparkFiles
---> 30 from pyspark.java_gateway import launch_gateway
     31 from pyspark.serializers import PickleSerializer, BatchedSerializer, UTF8Deserializer, \
     32     PairDeserializer, CompressedSerializer

/usr/local/spark/python/pyspark/java_gateway.py in <module>()
     24 from subprocess import Popen, PIPE
     25 from threading import Thread
---> 26 from py4j.java_gateway import java_import, JavaGateway, GatewayClient
     27
     28

ImportError: No module named py4j.java_gateway
user
  • 1,220
  • 1
  • 12
  • 31
user592419
  • 5,103
  • 9
  • 42
  • 67
  • 2
    I don't know if this is a real answer, but `sudo pip install py4j` fixed this problem for me. I assume this error comes after you already added SPARK_HOME to the PYTHON_PATH? – emmagras Nov 05 '14 at 19:34
  • I provided an answer to this same (or similar problem here). I may be helpful to you: http://stackoverflow.com/questions/24249847/running-pyspark-on-and-ide-like-spyder/28380155#28380155 – NYCeyes Mar 03 '15 at 23:38
  • I also set my `PYTHONPATH` to point to all needed python dependencies but got the same error. To resolve the problem, I also had to 1) install another copy of py4j at the `site-packages` folder where usual python packages are installed 2) change the permission of everything in the py4j folder so YARN executor nodes can read / execute the relevant files. – XValidated Aug 08 '15 at 17:22

6 Answers6

74

In my environment (using docker and the image sequenceiq/spark:1.1.0-ubuntu), I ran in to this. If you look at the pyspark shell script, you'll see that you need a few things added to your PYTHONPATH:

export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH

That worked in ipython for me.

Update: as noted in the comments, the name of the py4j zip file changes with each Spark release, so look around for the right name.

nealmcb
  • 12,479
  • 7
  • 66
  • 91
  • 12
    That's `export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH` in Spark 1.6.0 – Kyle Heuton Jan 09 '16 at 00:33
  • 3
    The name of the py4j zip file changes with every Spark version, so make sure the zip file your are pointing to in `$PYTHONPATH` actually exists. – Christian Long Oct 24 '16 at 20:12
30

I solved this problem by adding some paths in .bashrc

export SPARK_HOME=/home/a141890/apps/spark
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH

After this, it never raise ImportError: No module named py4j.java_gateway.

Anderson
  • 3,139
  • 3
  • 33
  • 45
  • I am also facing the same problem. Where should I write these export statements? I tried in cmd prompt and ipython notebook. It did not work for me in either of them – SRS Jun 28 '15 at 23:16
  • 1
    I set the spark path, python 2.7 path and py4j zip file path as the environment system variable path. I couldn't solve the issue. When I run **from pyspar import SparkContext** I am getting the error. – SRS Jun 29 '15 at 00:00
  • What does adding the ':$PYTHONPATH' do? – spacedustpi Dec 09 '19 at 17:41
9

Install pip module 'py4j'.

pip install py4j

I got this problem with Spark 2.1.1 and Python 2.7.x. Not sure if Spark stopped bundling this package in latest distributions. But installing py4j module solved the issue for me.

kn_pavan
  • 1,510
  • 3
  • 21
  • 42
  • 3
    you have to use version of py4j that's shipped with Spark. Even upgrades like from Spark 2.2 to 2.3 use incompatible versions of py4j. – Tagar Jul 30 '18 at 20:13
4

In Pycharm, before running above script, ensure that you have unzipped the py4j*.zip file. and add its reference in script sys.path.append("path to spark*/python/lib")

It worked for me.

3
#/home/shubham/spark-1.6.2
import os
import sys
# Set the path for spark installation
# this is the path where you have built spark using sbt/sbt assembly
os.environ['SPARK_HOME'] = "/home/shubham/spark-1.6.2"
# os.environ['SPARK_HOME'] = "/home/jie/d2/spark-0.9.1"
# Append to PYTHONPATH so that pyspark could be found
sys.path.append("/home/shubham/spark-1.6.2/python")
sys.path.append("/home/shubham/spark-1.6.2/python/lib")
# sys.path.append("/home/jie/d2/spark-0.9.1/python")
# Now we are ready to import Spark Modules
try:
    from pyspark import SparkContext
    from pyspark import SparkConf`enter code here`
    print "Hey nice"
except ImportError as e:
    print ("Error importing Spark Modules", e)
sys.exit(1)
1

For setup of PySpark with python 3.8, add below paths to bash profile (Mac):

export SPARK_HOME=/Users/<username>/spark-3.0.1-bin-hadoop2.7
export PATH=$PATH:/Users/<username>/spark-3.0.1-bin-hadoop2.7/bin
export PYSPARK_PYTHON=python3
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH

NOTE: Use the py4j path present in your downloaded spark package.

Save the new updated bash file: Ctrl + X.

Run the new bash file: source ~/.bash_profile