Pyspark on windows : Input path does not exist

Question

As I am new to pyspark, I did some research about my issue but none of the solutions worked for me.

I want to read a text file, I first put it in the same folder as my .py file in jupyter notebook. For that I run the following command:

rdd = sc.textFile("Parcours client.txt")
print(rdd.collect())

I get this error:

Input path does not exist: file:/C:/Spark/spark-2.3.0-bin-hadoop2.7/Data Analysis/Parcours client.txt

Although this is exactly where I put the file.txt, and I launch my pyspark from

C:/Spark/spark-2.3.0-bin-hadoop2.7

I tried also to indicate the local direction where my txt file exist:

rdd = sc.textFile("C:\\Users\\Jiji\\Desktop\\Data Analysis\\L'Output\\Parcours client.txt")
print(rdd.collect())

I get the same error:

Input path does not exist: file:/Users/Jiji/Desktop/Data Analysis/L'Output/Parcours client.txt

What happens when you try a simple path with no spaces or special chars, such as `"C:/parcours_client.txt"`? — ernest_k, Apr 04 '18 at 12:03
Thank you for replying. I still get the same error: _Input path does not exist: file:C:/parcours_client.txt_ — Iriel, Apr 04 '18 at 12:10

score 0 · Answer 1 · answered Apr 05 '18 at 04:20

0

Try rdd = sc.textFile("Parcours\ client.txt") or rdd = sc.textFile(r"Parcours client.txt")

See also: whitespaces in the path of windows filepath

answered Apr 05 '18 at 04:20

versatile parsley

411
2
6
15

Thank you for replying, I run the two commands but still got the same error. I have tried to put my txt file in the Desktop and run the following command `rdd = sc.textFile('C:\\Users\\Jiji\\Desktop\\Output\\Parcours clients .txt')` . I think the error was generated because of the spaces in the path. – Iriel Apr 05 '18 at 08:00

score 0 · Answer 2 · answered Apr 05 '18 at 08:05

Thank you everybody for your help.

I have tried to put my txt file in a folder in the desktop wich the name doesn't have any spaces and that solve my issue. So I run the following command:

rdd = sc.textFile('C:\\Users\\Jiji\\Desktop\\Output\\Parcours client.txt')

I think the issue was because of the spaces in the path.

Pyspark on windows : Input path does not exist

2 Answers2