I'm working on CentOS, I've setup $SPARK_HOME
and also added path to bin
in $PATH
.
I can run pyspark
from anywhere.
But when I try to create python
file and uses this statement;
from pyspark import SparkConf, SparkContext
it throws following error
python pysparktask.py
Traceback (most recent call last):
File "pysparktask.py", line 1, in <module>
from pyspark import SparkConf, SparkContext
ModuleNotFoundError: No module named 'pyspark'
I tried to install it again using pip
.
pip install pyspark
and it gives this error too.
Could not find a version that satisfies the requirement pyspark (from versions: ) No matching distribution found for pyspark
EDIT
based on answer, I updated the code.
error is
Traceback (most recent call last):
File "pysparktask.py", line 6, in <module>
from pyspark import SparkConf, SparkContext
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/__init__.py", line 44, in <module>
from pyspark.context import SparkContext
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/context.py", line 33, in <module>
from pyspark.java_gateway import launch_gateway
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/java_gateway.py", line 31, in <module>
from py4j.java_gateway import java_import, JavaGateway, GatewayClient
ModuleNotFoundError: No module named 'py4j'