3

I recently installed pyspark on Linux and get the error when importing pyspark:

ModuleNotFoundError: No module named 'pyspark'

Pyspark is in my 'pip list'

I addded the following lines to my .bashrc:

export SPARK_HOME=~/Spark/spark-3.0.1-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH
export PYSPARK_PYTHON=python3

If I type pyspark from the terminal, it work properly:

      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.0.1
      /_/
Using Python version 3.7.3 (default, Jul 25 2020 13:03:44)
SparkSession available as 'spark'.

In the terminal I can do all my coding, it just doesn't load import pyspark from a python script. It looks like my environment variables are okay.

I then typed:

import findspark
print(findspark.init())

And it says; ValueError: Couldn't find Spark, make sure SPARK_HOME env is set or Spark is in an expected location (e.g. from homebrew installation)

Jeroen
  • 801
  • 6
  • 20
  • how do you run your script? try use python with version: `python3.7 script.py` – Brown Bear Oct 02 '20 at 10:24
  • your solution indeeds works. Good to know how I can run it succesfully, but still want to know how I can do it in my interpreter (I use Thonny) – Jeroen Oct 02 '20 at 10:33
  • try to do this https://www.techcoil.com/blog/how-to-associate-a-python-3-virtual-environment-with-thonny/ – jacob galam Oct 02 '20 at 10:46
  • What is the output when you type `echo $SPARK_HOME` in your terminal? – Henrique Branco Oct 02 '20 at 11:12
  • Does this question helps: [How to Setup SPARK_HOME variable?](https://stackoverflow.com/questions/46613651/how-to-setup-spark-home-variable) – Henrique Branco Oct 02 '20 at 11:13
  • Also don't forget to set `JAVA_HOME` too. – Henrique Branco Oct 02 '20 at 11:15
  • echo gives me: /home/pi/Spark/spark-3.0.1-bin-hadoop2.7; the virtual environment was indeed already configured in Thonny – Jeroen Oct 02 '20 at 11:23
  • what is the python script name from which you are tryin to run.. it should not be pyspark – Sandeep Kothari Oct 02 '20 at 11:56
  • Name is test.py, I dont want to develop anything in JAVA. I have installed and run pyspark succesfully with windows on another computer and have not used JAVA_HOME there. I doubt whether JAVA will have to do anything with it, becuase I can run the script from the terminal. – Jeroen Oct 02 '20 at 16:10

2 Answers2

0

check your environment variable set properly or not by using

source ~/.bashrc
cd $SPARK_HOME/bin 

or provide complete path in script

import findspark
print(findspark.init('~/Spark/spark-3.0.1-bin-hadoop2.7/'))

  • 1
    This did not work. Pyspark is configured correctly, since it is running from the shell. It just doesnt run from a python script. – Jeroen Oct 02 '20 at 11:39
0

I had a similar problem when running a pyspark code on a Mac.

It worked when I addded the following line to my .bashrc:

export PYSPARK_SUBMIT_ARGS="--name job_name --master local --conf spark.dynamicAllocation.enabled=true pyspark-shell"

Or, when I added in my python code:

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = """--name job_name --master local --conf spark.dynamicAllocation.enabled=true pyspark-shell"""
Leo Arruda
  • 13
  • 3