java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. spark Eclipse on windows 7

Question

I'm not able to run a simple spark job in Scala IDE (Maven spark project) installed on Windows 7

Spark core dependency has been added.

val conf = new SparkConf().setAppName("DemoDF").setMaster("local")
val sc = new SparkContext(conf)
val logData = sc.textFile("File.txt")
logData.count()

Error:

16/02/26 18:29:33 INFO SparkContext: Created broadcast 0 from textFile at FrameDemo.scala:13
16/02/26 18:29:34 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
    at <br>org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
    at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
    at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
    at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)<br>
    at scala.Option.map(Option.scala:145)<br>
    at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)<br>
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)<br>
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>
    at scala.Option.getOrElse(Option.scala:120)<br>
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)<br>
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>
    at scala.Option.getOrElse(Option.scala:120)<br>
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)<br>
    at org.apache.spark.rdd.RDD.count(RDD.scala:1143)<br>
    at com.org.SparkDF.FrameDemo$.main(FrameDemo.scala:14)<br>
    at com.org.SparkDF.FrameDemo.main(FrameDemo.scala)<br>

score 161 · Accepted Answer · edited Jan 31 '22 at 16:57

161

Here is a good explanation of your problem with the solution.

Download the version of winutils.exe from https://github.com/steveloughran/winutils.
Set up your HADOOP_HOME environment variable on the OS level or programmatically:

System.setProperty("hadoop.home.dir", "full path to the folder with winutils");
Enjoy

edited Jan 31 '22 at 16:57

anAgent

2,550
24
34

answered Feb 26 '16 at 13:21

Taky

5,284
1
20
29

15

I have to set HADOOP_HOME to hadoop folder instead of the bin folder. – Stanley Aug 29 '16 at 07:44
6

Also, be sure to download the correct winutils.exe based on the version of hadoop that spark is compiled for (so, not necessarily the link above). Otherwise, pain awaits :) – NP3 Jun 30 '17 at 12:14
System.setProperty("hadoop.home.dir", "C:\\hadoop-2.7.1\\") – Shyam Gupta Oct 14 '17 at 19:00
1

yes exactly as @Stanley says. worked with setting up the HADOOP_HOME to hadoop folder instead of the bin folder. – Jazz Apr 09 '19 at 13:09
1

@NP3 and how do you know that version? I am using latest pyspark. Thanks, – JDPeckham Nov 10 '19 at 19:12
For the correct version of *winutils.exe* checkout this [github](https://github.com/steveloughran/winutils) repo. Choose the same version as the package type you choose for the Spark .tgz when you downloaded it from the official website. – Bendemann May 02 '20 at 06:05

score 77 · Answer 2 · edited Aug 21 '17 at 07:09

77

Download winutils.exe
Create folder, say C:\winutils\bin
Copy winutils.exe inside C:\winutils\bin
Set environment variable HADOOP_HOME to C:\winutils

edited Aug 21 '17 at 07:09

Kenny John Jacob

1,188
8
21

answered Sep 16 '16 at 07:26

Deokant Gupta

771
5
2

2

also, if you have a cmd line open, restart it for the variables to take affect. – eych Aug 21 '19 at 16:51
also add new entry to the "Path" variable as "C:\winutils\bin". – S.P. Apr 16 '22 at 10:12

Ani Menon · Answer 3 · 2017-01-11T06:28:17.867

30

Follow this:

Create a bin folder in any directory(to be used in step 3).
Download winutils.exe and place it in the bin directory.
Now add System.setProperty("hadoop.home.dir", "PATH/TO/THE/DIR"); in your code.

edited Jan 11 '17 at 06:28

answered Jan 11 '17 at 06:22

Ani Menon

27,209
16
105
126

2

Thanks a lot, just what i was looking for – user373201 Feb 27 '17 at 02:59
4

It is to be noted that the path to be pointed should not include the 'bin' directory. Ex: If the path where winutils.exe is "D://Hadoop//bin//winutils.exe" , then the path for hadoop.home.dir should be "D://Hadoop" – Keshav Pradeep Ramanath May 31 '18 at 10:30
Hi I followed the steps above, but I got `WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties` (PS: I'm using Pyspark 2.4.4 in Pycharm) – wawawa Feb 05 '21 at 13:15

score 7 · Answer 4 · answered Sep 30 '18 at 07:39

1) Download winutils.exe from https://github.com/steveloughran/winutils 
2) Create a directory In windows "C:\winutils\bin
3) Copy the winutils.exe inside the above bib folder .
4) Set the environmental property in the code 
  System.setProperty("hadoop.home.dir", "file:///C:/winutils/");
5) Create a folder "file:///C:/temp" and give 777 permissions.
6) Add config property in spark Session ".config("spark.sql.warehouse.dir", "file:///C:/temp")"

score 5 · Answer 5 · answered Apr 13 '18 at 09:11

You can alternatively download winutils.exe from GITHub:

https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin

replace hadoop-2.7.1 with the version you want and place the file in D:\hadoop\bin

If you do not have access rights to the environment variable settings on your machine, simply add the below line to your code:

System.setProperty("hadoop.home.dir", "D:\\hadoop");

score 5 · Answer 6 · answered Jun 27 '18 at 10:42

On Windows 10 - you should add two different arguments.

(1) Add the new variable and value as - HADOOP_HOME and path (i.e. c:\Hadoop) under System Variables.

(2) Add/append new entry to the "Path" variable as "C:\Hadoop\bin".

The above worked for me.

score 4 · Answer 7 · edited Jun 20 '20 at 09:12

if we see below issue

ERROR Shell: Failed to locate the winutils binary in the hadoop binary path

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

then do following steps

download winutils.exe from http://public-repo-1.hortonworks.com/hdp- win-alpha/winutils.exe.
and keep this under bin folder of any folder you created for.e.g. C:\Hadoop\bin
and in program add following line before creating SparkContext or SparkConf System.setProperty("hadoop.home.dir", "C:\Hadoop");

score 2 · Answer 8 · answered Apr 03 '18 at 18:28

I got the same problem while running unit tests. I found this workaround solution:

The following workaround allows to get rid of this message:

    File workaround = new File(".");
    System.getProperties().put("hadoop.home.dir", workaround.getAbsolutePath());
    new File("./bin").mkdirs();
    new File("./bin/winutils.exe").createNewFile();

from: https://issues.cloudera.org/browse/DISTRO-544

score 1 · Answer 9 · edited Jul 02 '17 at 13:21

1

Setting the Hadoop_Home environment variable in system properties didn't work for me. But this did:

Set the Hadoop_Home in the Eclipse Run Configurations environment tab.
Follow the 'Windows Environment Setup' from here

edited Jul 02 '17 at 13:21

Alexei - check Codidact

22,016
16
145
164

answered Jul 02 '17 at 13:12

Ramya

11
2

score 1 · Answer 10 · answered Feb 09 '22 at 05:47

1

Download winutils.exe and hadoop.dll in your windows machine.
create folder C:\hadoop\bin
Copy winutils.exe and hadoop.dll in newly created hadoop folder
Setup environment variable HADOOP_HOME=C:\hadoop

answered Feb 09 '22 at 05:47

Swapnil

11
1

score 0 · Answer 11 · answered Feb 08 '17 at 08:00

On top of mentioning your environment variable for HADOOP_HOME in windows as C:\winutils, you also need to make sure you are the administrator of the machine. If not and adding environment variables prompts you for admin credentials (even under USER variables) then these variables will be applicable once you start your command prompt as administrator.

score 0 · Answer 12 · answered May 29 '18 at 03:58

I have also faced the similar problem with the following details Java 1.8.0_121, Spark spark-1.6.1-bin-hadoop2.6, Windows 10 and Eclipse Oxygen.When I ran my WordCount.java in Eclipse using HADOOP_HOME as a system variable as mentioned in the previous post, it did not work, what worked for me is -

System.setProperty("hadoop.home.dir", "PATH/TO/THE/DIR");

PATH/TO/THE/DIR/bin=winutils.exe whether you run within Eclipse as a Java application or by spark-submit from cmd using

spark-submit --class groupid.artifactid.classname --master local[2] /path to the jar file created using maven /path to a demo test file /path to output directory command

Example: Go to the bin location of Spark/home/location/bin and execute the spark-submit as mentioned,

D:\BigData\spark-2.3.0-bin-hadoop2.7\bin>spark-submit --class com.bigdata.abdus.sparkdemo.WordCount --master local[1] D:\BigData\spark-quickstart\target\spark-quickstart-0.0.1-SNAPSHOT.jar D:\BigData\spark-quickstart\wordcount.txt

score -1 · Answer 13 · answered Jul 05 '18 at 23:09

-1

That's a tricky one... Your storage letter must be capical. For example "C:\..."

answered Jul 05 '18 at 23:09

Achilles

69
3

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. spark Eclipse on windows 7

13 Answers13

Linked