(null) entry in command string exception in saveAsTextFile() on Pyspark

Question

I am working in PySpark on a Jupyter notebook (Python 2.7) in windows 7. I have an RDD of type pyspark.rdd.PipelinedRDD called idSums. When attempting to execute idSums.saveAsTextFile("Output"), I receive the following error:

Py4JJavaError: An error occurred while calling o834.saveAsTextFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 33.0 failed 1 times, most recent failure: Lost task 1.0 in stage 33.0 (TID 131, localhost): java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\seride\Desktop\Experiments\PySpark\Output\_temporary\0\_temporary\attempt_201611231307_0033_m_000001_131\part-00001

There shouldn't be any problem with the RDD object, in my opinion, because I'm able to execute other actions without error, e.g. executing idSums.collect() produces the correct output.

Furthermore, the Output directory is created (with all subdirectories) and the file part-00001 is created, but it is 0 bytes.

score 46 · Accepted Answer · edited Dec 27 '20 at 18:52

46

You are missing winutils.exe a hadoop binary . Depending upon x64 bit / x32 bit System download the winutils.exe file & set your hadoop home pointing to it.

1st way :

Download the file
Create hadoop folder in Your System, ex C:
Create bin folder in hadoop directory, ex : C:\hadoop\bin
paste winutils.exe in bin, ex: C:\hadoop\bin\winutils.exe
In User Variables in System Properties -> Advance System Settings

Create New Variable Name: HADOOP_HOME Path: C:\hadoop\

2nd Way :

You can set hadoop home directly in Your Java Program with the following Command like this :

System.setProperty("hadoop.home.dir","C:\hadoop" );

edited Dec 27 '20 at 18:52

mck

40,932
13
35
50

answered Dec 04 '16 at 13:07

Harpreet Varma

490
5
4

This solution works. Thank you! Note that while spark can run without hadoop (as you know @HapreetVarma), it runs with reduced functionality. – Jr Swec Dec 05 '16 at 12:02
2nd way works fine and able to locate winutils.exe, But getting new error "ExitCodeException exitCode=1: ChangeFileModeByMask error (5): Access is denied". Any suggestions would be helpful. – Venkat Jul 18 '18 at 06:52
But why does Spark need 'winutils.exe'? Any leads guys? – Dinesh Kumar P Nov 18 '18 at 04:38
@DineshKumarP if you set it up on windows then you would need it. – Ani Menon May 23 '20 at 07:02

score 2 · Answer 2 · edited May 29 '20 at 09:03

2

I had a similar exception saying permission issue when loading a model built in some other machine and copied in my Windows system although my HADOOP_HOME was set.

I just ran the following command over my model folder:

winutils.exe chmod -R 777 model-path

edited May 29 '20 at 09:03

logi-kal

7,107
6
31
43

answered Dec 16 '18 at 14:35

jasoos

71
6

score 0 · Answer 3 · answered May 05 '18 at 13:36

Same procedure as describe above by @Hapreet Varma

You are missing winutils.exe a hadoop binary . Depending upon x64 bit / x32 bit System download the winutils.exe file & set your hadoop home pointing to it .

1st way :

1.Download the file

2.Create hadoop folder in Your System ex " C:"

3.Create bin folder in hadoop directory ex : C:\hadoop\bin

4.paste winutils.exe in bin ex: C:\hadoop\bin\winuitls.exe

5.In User Variables in System Properties -> Advance System Settings

Create New Variable Name: HADOOP_HOME Path: C:\hadoop\

2nd Way :

You can set hadoop home directly in Your Java Program with the following Command like this :

System.setProperty("hadoop.home.dir","C:\hadoop" );

if you are working in Eclipse Oxygen than you must restart the Eclipse after setting the variable in system properties otherwise it will not work. In case of wrong path set Eclipse will show the current set location of winutils at the start of the log. some thing like this

2018-05-05 18:27:47 ERROR Shell:397 - Failed to locate the winutils binary in the hadoop binary path

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

Full error log is here

2018-05-05 18:27:47 ERROR Shell:397 - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
    at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2464)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2464)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2464)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:292)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
    at com.dataflair.spark.Wordcount$.main(Wordcount.scala:13)
    at com.dataflair.spark.Wordcount.main(Wordcount.scala)

score 0 · Answer 4 · answered Nov 30 '20 at 21:21

I found additional information that may help others fix this issue. Sometimes when you set up spark on a windows machine you fail to get the hadoop.dll file. Simply take the hadoop.dll file from winutils GitRepo (https://github.com/4ttty/winutils) and place it within you windows system 32 folder. After I did this, I was able to write to disk.

Original credit for this answer - https://programmersought.com/article/53121203250/

score 0 · Answer 5 · answered Feb 28 '23 at 01:33

0

Even after setting the bin path if it is not working then you need to close the eclipse scala ide application and open it again and try to run it .

answered Feb 28 '23 at 01:33

Sai Krishna S

1

(null) entry in command string exception in saveAsTextFile() on Pyspark

5 Answers5

Linked

Related