How to set up pySpark on intellij. Even after setting the environment variables spark_home and pythonpath, import pySpark is giving error - Import error : No module named pySpark
-
Have a look at `findSpark`. The problem is that PySpark isn't on sys.path by default You can address this by either symlinking pyspark into your site-packages, or adding pyspark to sys.path at runtime. findspark does the latter, see https://github.com/minrk/findspark – Alex May 05 '17 at 08:47
-
you can try this https://medium.com/@gauravmshah/pyspark-on-intellij-with-packages-auto-complete-5e3208504707 – Gaurav Shah Dec 13 '18 at 18:17
4 Answers
- Go to File -> Settings
- Look for Project Structure
- Click on Add Content Root and add
$SPARK_HOME
/python
After this, your editor will look in the Spark's python directory for source files.

- 106
- 5
Click on edit configuration
Click on environment variables
Add these variables
PYTHONPATH = %SPARK_HOME%\python;%SPARK_HOME%\python\build;%PYTHONPATH%
PYSPARK_SUBMIT_ARGS = --master local[2] pyspark-shell
SPARK_HOME = <spark home path>
SPARK_CONF_DIR = %SPARK_HOME\conf
SPARK_LOCAL_IP = 127.0.0.1

- 179,855
- 19
- 132
- 245

- 1
- 2
I followed steps as per https://www.youtube.com/watch?v=j8AcYWQuv-M and it helped me connect successfully, with modifications as below.
Ensuring Plugin for Python is installed (I used Python 3.9)
Downloading Spark 3.1.1 from https://spark.apache.org/downloads.html. Entering details of python and py4j paths from here.
Setting the JAVA_HOME correctly - lower jdk 1.8 (JDK home path /Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home)
The extra additional step that I did was adding JAVA_HOME same as above as Environment Variable under Run/Debug Configurations option in IntelliJ.

- 30,962
- 25
- 85
- 135