-1

I am new to pyspark. I installed Pyspark on my windows machine

I downloaded apache spark from Spark download url

I set HADOOP_HOME and SPARK_HOME in environment variables

path variable

my SPARK_HOME=C:\spark\spark-2.4.4-bin-hadoop2.7

my HADOOP_HOME=C:\spark\spark-2.4.4-bin-hadoop2.7

But when I enter pyspark on command prompt I am getting

The system cannot find the path specified.

Even if I am going to bin directory and executing pyspark it is throwing same exception

Not sure what I missed here.please help me here

Nandy
  • 3
  • 1
  • 4
  • 2
    Does this answer your question? [The system cannot find the path specified error while running pyspark](https://stackoverflow.com/questions/49340941/the-system-cannot-find-the-path-specified-error-while-running-pyspark) – David Taub Oct 25 '20 at 14:41

5 Answers5

2

Set the path as given below:

Java

JAVA_HOME = C:\Program Files\Java\jdk1.8.0_73

PATH = C:\Program Files\Java\jdk1.8.0_73\bin

Hadoop

Create a folder Hadoop/bin and place the winutils.exe file inside the bin folder.

HADOOP_HOME = C:\Hadoop

PATH = C:\Hadoop\bin

Spark

Download whichever spark version(eg: spark-2.4.4-bin-hadoop2.7)

SPARK_HOME = C:\software\spark-2.3.1-bin-hadoop2.7

PATH = C:\software\spark-2.3.1-bin-hadoop2.7\bin

Ghost
  • 492
  • 4
  • 10
0

The easiest way to install spark is by using the python library findspark

pip install findspark

import findspark

findspark.init('\path\to\extracted\binaries\folder')

import pyspark
noddables
  • 9
  • 4
Ravi
  • 592
  • 3
  • 11
0

I had same problem, did multiple research and finally i found that i am having jdk with jdk1.8.0_261 and JRE jre1.8.0_271

As solution, i uninstalled both jdk and jre and then installed jdk1.8.0_261, which basically installed both with same version jdk1.8.0_261 jre1.8.0_261

which resolved the issue.

zMan
  • 21
  • 4
0

Try to locate the pyspark path and then export that path, then install findSpark package , it will do the rest of the work , for example let's say that my pyspark path is : "/usr/spark-2.4.4/python/pyspark/" so what I have to do is:

!export SPARK_HOME="/usr/spark-2.4.4/python/pyspark/"
!pip install findspark

import findspark
findspark.init()
from pyspark.sql import SparkSession 

itIsNaz
  • 621
  • 5
  • 11
-1

Try with adding this code segment.

import os
import sys
os.environ['HADOOP_HOME'] = "Your_Hadoop_Home_Path"
# os.environ['HADOOP_HOME'] = "~file_path~\Hadoop\hadoop-3.x.x"

#what actually done here is changing the HADOOP_HOME environment path

DM-techie
  • 1
  • 2