2

I'm working on CentOS, I've setup $SPARK_HOME and also added path to bin in $PATH.

I can run pyspark from anywhere.

But when I try to create python file and uses this statement;

from pyspark import SparkConf, SparkContext

it throws following error

python pysparktask.py
    Traceback (most recent call last):
    File "pysparktask.py", line 1, in <module>
      from pyspark import SparkConf, SparkContext
    ModuleNotFoundError: No module named 'pyspark'

I tried to install it again using pip.

pip install pyspark

and it gives this error too.

Could not find a version that satisfies the requirement pyspark (from versions: ) No matching distribution found for pyspark

EDIT

based on answer, I updated the code.

error is

Traceback (most recent call last):
  File "pysparktask.py", line 6, in <module>
    from pyspark import SparkConf, SparkContext
  File "/opt/mapr/spark/spark-2.0.1/python/pyspark/__init__.py", line 44, in <module>
    from pyspark.context import SparkContext
  File "/opt/mapr/spark/spark-2.0.1/python/pyspark/context.py", line 33, in <module>
    from pyspark.java_gateway import launch_gateway
  File "/opt/mapr/spark/spark-2.0.1/python/pyspark/java_gateway.py", line 31, in <module>
    from py4j.java_gateway import java_import, JavaGateway, GatewayClient
ModuleNotFoundError: No module named 'py4j'
Mubin
  • 4,325
  • 5
  • 33
  • 55
  • I think for the error that you are getting after updating your code you can check http://stackoverflow.com/questions/26533169/why-cant-pyspark-find-py4j-java-gateway – Afaq Mar 30 '17 at 19:29

2 Answers2

5

Add the following environment variable and also append spark's lib path to sys.path

import os
import sys

os.environ['SPARK_HOME'] = "/usr/lib/spark/"
sys.path.append("/usr/lib/spark/python/")

from pyspark import SparkConf, SparkContext # And then try to import SparkContext.
Afaq
  • 1,146
  • 1
  • 13
  • 25
2
pip install -e /spark-directory/python/.

this installation will be solve your problem. And you must edit bash_profile

export SPARK_HOME="/spark-directory"
Gökhan Ayhan
  • 1,184
  • 11
  • 12
  • 1
    above answer worked for me, as I'm using MapR, so I don't need to install it explicitly. – Mubin Apr 13 '17 at 15:10