0

As I am new to pyspark, I did some research about my issue but none of the solutions worked for me.

I want to read a text file, I first put it in the same folder as my .py file in jupyter notebook. For that I run the following command:

rdd = sc.textFile("Parcours client.txt")
print(rdd.collect())

I get this error:

Input path does not exist: file:/C:/Spark/spark-2.3.0-bin-hadoop2.7/Data Analysis/Parcours client.txt

Although this is exactly where I put the file.txt, and I launch my pyspark from

C:/Spark/spark-2.3.0-bin-hadoop2.7

I tried also to indicate the local direction where my txt file exist:

rdd = sc.textFile("C:\\Users\\Jiji\\Desktop\\Data Analysis\\L'Output\\Parcours client.txt")
print(rdd.collect())

I get the same error:

Input path does not exist: file:/Users/Jiji/Desktop/Data Analysis/L'Output/Parcours client.txt
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Iriel
  • 171
  • 2
  • 2
  • 13

2 Answers2

0

Try rdd = sc.textFile("Parcours\ client.txt") or rdd = sc.textFile(r"Parcours client.txt")

See also: whitespaces in the path of windows filepath

versatile parsley
  • 411
  • 2
  • 6
  • 15
  • Thank you for replying, I run the two commands but still got the same error. I have tried to put my txt file in the Desktop and run the following command `rdd = sc.textFile('C:\\Users\\Jiji\\Desktop\\Output\\Parcours clients .txt')` . I think the error was generated because of the spaces in the path. – Iriel Apr 05 '18 at 08:00
0

Thank you everybody for your help.

I have tried to put my txt file in a folder in the desktop wich the name doesn't have any spaces and that solve my issue. So I run the following command:

rdd = sc.textFile('C:\\Users\\Jiji\\Desktop\\Output\\Parcours client.txt')

I think the issue was because of the spaces in the path.

Iriel
  • 171
  • 2
  • 2
  • 13