Pyspark The system cannot find the path specified

Question

I am new to pyspark. I installed Pyspark on my windows machine

I downloaded apache spark from Spark download url

I set HADOOP_HOME and SPARK_HOME in environment variables

path variable

my SPARK_HOME=C:\spark\spark-2.4.4-bin-hadoop2.7

my HADOOP_HOME=C:\spark\spark-2.4.4-bin-hadoop2.7

But when I enter pyspark on command prompt I am getting

The system cannot find the path specified.

Even if I am going to bin directory and executing pyspark it is throwing same exception

Not sure what I missed here.please help me here

Does this answer your question? [The system cannot find the path specified error while running pyspark](https://stackoverflow.com/questions/49340941/the-system-cannot-find-the-path-specified-error-while-running-pyspark) — David Taub, Oct 25 '20 at 14:41

score 2 · Answer 1 · answered Jan 28 '20 at 09:32

Set the path as given below:

Java

JAVA_HOME = C:\Program Files\Java\jdk1.8.0_73

PATH = C:\Program Files\Java\jdk1.8.0_73\bin

Hadoop

Create a folder Hadoop/bin and place the winutils.exe file inside the bin folder.

HADOOP_HOME = C:\Hadoop

PATH = C:\Hadoop\bin

Spark

Download whichever spark version(eg: spark-2.4.4-bin-hadoop2.7)

SPARK_HOME = C:\software\spark-2.3.1-bin-hadoop2.7

PATH = C:\software\spark-2.3.1-bin-hadoop2.7\bin

score 0 · Answer 2 · edited Jul 06 '23 at 09:40

0

The easiest way to install spark is by using the python library findspark

pip install findspark

import findspark

findspark.init('\path\to\extracted\binaries\folder')

import pyspark

edited Jul 06 '23 at 09:40

noddables

9
4

answered Jan 28 '20 at 03:08

Ravi

592
3
11

score 0 · Answer 3 · answered Jan 27 '21 at 23:37

I had same problem, did multiple research and finally i found that i am having jdk with jdk1.8.0_261 and JRE jre1.8.0_271

As solution, i uninstalled both jdk and jre and then installed jdk1.8.0_261, which basically installed both with same version jdk1.8.0_261 jre1.8.0_261

which resolved the issue.

score 0 · Answer 4 · answered Jan 28 '21 at 00:01

Try to locate the pyspark path and then export that path, then install findSpark package , it will do the rest of the work , for example let's say that my pyspark path is : "/usr/spark-2.4.4/python/pyspark/" so what I have to do is:

!export SPARK_HOME="/usr/spark-2.4.4/python/pyspark/"
!pip install findspark

import findspark
findspark.init()
from pyspark.sql import SparkSession

DM-techie · Answer 5 · 2021-04-27T20:00:49.937

-1

Try with adding this code segment.

import os
import sys
os.environ['HADOOP_HOME'] = "Your_Hadoop_Home_Path"
# os.environ['HADOOP_HOME'] = "~file_path~\Hadoop\hadoop-3.x.x"

#what actually done here is changing the HADOOP_HOME environment path

edited Apr 27 '21 at 20:00

answered Apr 26 '21 at 21:41

DM-techie

1
2

Seems like an environment issue and not a code issue. Adding paths to `os.environ` won't fix the issue – sagar1025 Apr 27 '21 at 01:53
yep, what this do is changing the environment, which is changing HADOOP_HOME environment – DM-techie Apr 27 '21 at 19:57

Pyspark The system cannot find the path specified

5 Answers5

Set the path as given below:

Java

Hadoop

Spark