I recently installed pyspark on Linux and get the error when importing pyspark:
ModuleNotFoundError: No module named 'pyspark'
Pyspark is in my 'pip list'
I addded the following lines to my .bashrc:
export SPARK_HOME=~/Spark/spark-3.0.1-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH
export PYSPARK_PYTHON=python3
If I type pyspark from the terminal, it work properly:
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.0.1
/_/
Using Python version 3.7.3 (default, Jul 25 2020 13:03:44)
SparkSession available as 'spark'.
In the terminal I can do all my coding, it just doesn't load import pyspark from a python script. It looks like my environment variables are okay.
I then typed:
import findspark
print(findspark.init())
And it says; ValueError: Couldn't find Spark, make sure SPARK_HOME env is set or Spark is in an expected location (e.g. from homebrew installation)