35

I am new to Hadoop and have run into problems trying to run it on my Windows 7 machine. Particularly I am interested in running Hadoop 2.1.0 as its release notes mention that running on Windows is supported. I know that I can try to run 1.x versions on Windows with Cygwin or even use prepared VM by for example Cloudera, but these options are in some reasons less convenient for me.

Having examined a tarball from http://apache-mirror.rbc.ru/pub/apache/hadoop/common/hadoop-2.1.0-beta/ I found that there really are some *.cmd scripts that can be run without Cygwin. Everything worked fine when I formated HDFS partition but when I tried to run hdfs namenode daemon I faced two errors: first, non fatal, was that winutils.exe could not be found (it really wasn't present in the tarball downloaded). I found the sources of this component in the Apache Hadoop sources tree and compiled it with Microsoft SDK and MSbuild. Thanks to detailed error message it was clear where to put the executable to satisfy Hadoop. But the second error which is fatal doesn't contain enough information for me to solve:

13/09/05 10:20:09 FATAL namenode.NameNode: Exception in namenode join
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:423)
    at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:952)
    at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:451)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:282)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:200)
...
13/09/05 10:20:09 INFO util.ExitUtil: Exiting with status 1

Looks like something else should be compiled. I'm going to try to build Hadoop from the source with Maven but isn't there a simpler way? Isn't there some option-I-know-not-of that can disable native code and make that tarball usable on Windows?

Thank you.

UPDATED. Yes, indeed. "Homebrew" package contained some extra files, most importantly winutils.exe and hadoop.dll. With this files namenode and datanode started successfully. I think the question can be closed. I didn't delete it in case someone face the same difficulty.

UPDATED 2. To build the "homebrew" package I did the following:

  1. Got sources, and unpacked them.
  2. Read carefully BUILDING.txt.
  3. Installed dependencies:
    3a) Windows SDK 7.1
    3b) Maven (I used 3.0.5) 3c) JDK (I used 1.7.25)
    3d) ProtocolBuffer (I used 2.5.0 - http://protobuf.googlecode.com/files/protoc-2.5.0-win32.zip). It is enough just to put compiler (protoc.exe) into some of the PATH folders.
    3e) A set of UNIX command line tools (I installed Cygwin)
  4. Started command line of Windows SDK. Start | All programs | Microsoft Windows SDK v7.1 | ... Command Prompt (I modified this shortcut, adding option /release in the command line to build release versions of native code). All the next steps are made from inside SDK command line window)
  5. Set up the environment:

    set JAVA_HOME={path_to_JDK_root}

It seems that JAVA_HOME MUST NOT contain space!

set PATH={path_to_maven_bin};%PATH%  
set Platform=x64  
set PATH={path_to_cygwin_bin};%PATH%  
set PATH={path_to_protoc.exe};%PATH%  
  1. Changed dir to sources root folder (BUILDING.txt warns that there are some limitations on the path length so sources root should have short name - I used D:\hds)
  2. Ran building process:

    mvn package -Pdist -DskipTests

You can try without 'skipTests' but on my machine some tests failed and building was terminated. It may be connected to sybolic link issues mentioned in BUILDING .txt. 8. Picked the result in hadoop-dist\target\hadoop-2.1.0-beta (windows executables and dlls are in 'bin' folder)

Hatter
  • 773
  • 1
  • 6
  • 12
  • @mamdouh alramadan Thank you for the advice. It may happen that I switch to some flavour of Linux with Hadoop. But for now all my environment is Windows-based and having HDFS partition with large data files inside virtual machine inside host OS doesn't seem... _graceful_ to me. – Hatter Sep 05 '13 at 10:39

13 Answers13

18

I have followed following steps to install Hadoop 2.2.0

Steps to build Hadoop bin distribution for Windows

  1. Download and install Microsoft Windows SDK v7.1.

  2. Download and install Unix command-line tool Cygwin.

  3. Download and install Maven 3.1.1.

  4. Download Protocol Buffers 2.5.0 and extract to a folder (say c:\protobuf).

  5. Add Environment Variables JAVA_HOME, M2_HOME and Platform if not added already. Note : Variable name Platform is case sensitive. And value will be either x64 or Win32 for building on a 64-bit or 32-bit system. Edit Path Variable to add bin directory of Cygwin (say C:\cygwin64\bin), bin directory of Maven (say C:\maven\bin) and installation path of Protocol Buffers (say c:\protobuf).

  6. Download hadoop-2.2.0-src.tar.gz and extract to a folder having short path (say c:\hdfs) to avoid runtime problem due to maximum path length limitation in Windows.

  7. Select Start --> All Programs --> Microsoft Windows SDK v7.1 and open Windows SDK 7.1 Command Prompt. Change directory to Hadoop source code folder (c:\hdfs). Execute mvn package with options -Pdist,native-win -DskipTests -Dtar to create Windows binary tar distribution.

  8. If everything goes well in the previous step, then native distribution hadoop-2.2.0.tar.gz will be created inside C:\hdfs\hadoop-dist\target\hadoop-2.2.0 directory.

Install Hadoop

  1. Extract hadoop-2.2.0.tar.gz to a folder (say c:\hadoop).

  2. Add Environment Variable HADOOP_HOME and edit Path Variable to add bin directory of HADOOP_HOME (say C:\hadoop\bin).

Configure Hadoop

C:\hadoop\etc\hadoop\core-site.xml

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:9000</value>
        </property>
</configuration>

C:\hadoop\etc\hadoop\hdfs-site.xml

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/hadoop/data/dfs/namenode</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/hadoop/data/dfs/datanode</value>
        </property>
</configuration>

C:\hadoop\etc\hadoop\mapred-site.xml

<configuration>
        <property>
           <name>mapreduce.framework.name</name>
           <value>yarn</value>
        </property>
</configuration>

C:\hadoop\etc\hadoop\ yarn-site.xml

<configuration>
        <property>
           <name>yarn.nodemanager.aux-services</name>
           <value>mapreduce_shuffle</value>
        </property>
        <property>
           <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
           <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
</configuration>

Format namenode

For the first time only, namenode needs to be formatted.

C:\Users\abhijitg>cd c:\hadoop\bin 
c:\hadoop\bin>hdfs namenode –format

Start HDFS (Namenode and Datanode)

C:\Users\abhijitg>cd c:\hadoop\sbin
c:\hadoop\sbin>start-dfs

Start MapReduce aka YARN (Resource Manager and Node Manager)

C:\Users\abhijitg>cd c:\hadoop\sbin
c:\hadoop\sbin>start-yarn
starting yarn daemons

Total four separate Command Prompt windows will be opened automatically to run Namenode, Datanode, Resource Manager, Node Manager

Reference : Build, Install, Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS

Abhijit
  • 746
  • 5
  • 18
  • 3
    "Links to external resources are encouraged, but please add context around the link so your fellow users will have some idea what it is and why it’s there. Always quote the most relevant part of an important link, in case the target site is unreachable or goes permanently offline." http://stackoverflow.com/help/how-to-answer – Wouter J Nov 03 '13 at 21:22
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – bensiu Nov 03 '13 at 21:30
  • Thank you for the detailed tutorial. :) – dharam Jul 02 '14 at 21:26
  • What is the use of Cygwin and Protocol Buffers 2.5.0 here? – Bhuvan Apr 13 '15 at 14:46
16

Han has prepared the Hadoop 2.2 Windows x64 binaries (see his blog) and uploaded them to Github.

After putting the two binaries winutils.exe and hadoop.dll into the %hadoop_prefix%\bin folder, I got the same UnsatisfiedLinkError.

The problem was that some dependency of hadoop.dll was missing. I used Dependency Walker to check the dependencies of the binaries and the Microsoft Visual C++ 2010 Redistributables were missing.

So besides building all the components yourself, the answer to the problem is

  • make sure to use the same architecture for Java and the native code. java -version tells you if you use 32 or x64.
  • then use Dependency Walker to make sure all native binaries are pure and of the same architecture. Sometimes a x64 dependency is missing and Windows falls back to x86, which does not work. See answer of another question.
  • also check if all dependencies of the native binaries are satisfied.
Community
  • 1
  • 1
Peter Kofler
  • 9,252
  • 8
  • 51
  • 79
14

I had the same problem but with recent hadoop v. 2.2.0. Here are my steps for solving that problem:

  1. I've built winutils.exe from sources. Project directory:

    hadoop-2.2.0-src\hadoop-common-project\hadoop-common\src\main\winutils

    My OS: Windows 7. Tool for building: MS Visual Studio Express 2013 for Windows Desktop (it's free and can be loaded from http://www.microsoft.com/visualstudio/). Open Studio, File -> Open -> winutils.sln. Right click on solution on the right side -> Build. There were a couple errors in my case (you might need to fix project properties, specify output folder). Viola! You get winutils.exe - put it into hadoop's bin.

  2. Next we need to build hadoop.dll. Some woodoo magic here goes: open

    hadoop-2.2.0-src\hadoop-common-project\hadoop-common\src\main\native\native.sln

    in MS VS; right click on solution -> build. I got a bunch of errors. I created manually several missed header files (don't ask me why they are missed in source tarball!):

    https://github.com/jerishsd/hadoop-experiments/tree/master/sources

    (and don't ask me what this project on git is for! I don't know - google pointed it out by searching header file names) I've copied

    hadoop-2.2.0-src\hadoop-common-project\hadoop-common\target\winutils\Debug\libwinutils.lib

    (result of step # 1) into

    hadoop-2.2.0-src\hadoop-common-project\hadoop-common\target\bin

    And finally build operation produces hadoop.dll! Put it again into hadoop's bin and happily run namenode!

Hope my steps will help somebody.

Aleksei Egorov
  • 801
  • 9
  • 16
  • Well, in my case there were less Woodoo magic but more steps. I just followed BUILDING.txt from the sources distribution. I had to install some dependencies: free Windows SDK, Maven, ProtocolBuffer library and a set of Unix utils. After that Maven did all the job. Of course, it built not only native code, but Java packages also. The only issue I remember is that some failing tests canceled building process. So I had to disable them through Maven command line options. – Hatter Oct 24 '13 at 09:34
  • 1
    @Aleksei- I was trying to build native.sln. I got the errors for the missing header files. Can you tell me the path where I should be putting the missing files? – Aviral Kumar Aug 31 '14 at 18:27
8

In addition to other solutions, here is a pre-built copy of winutil.exe. Donload it and add to $HADOOP_HOME/bin. It works for me.

(Source :Click here)

Prasad D
  • 1,496
  • 1
  • 14
  • 8
  • @Prasad D I have used your utilities and I could successfully start HDFS. But it is failing while starting YARN. Can you please check and reply http://stackoverflow.com/questions/30964216 – Kaushik Lele Jun 21 '15 at 12:12
  • 1
    Thanks. Is there is a link for hadoop-2.6? – Tagar Sep 14 '15 at 16:30
6

Please add hadoop.dll (version sensitive) to the system32 directory under Windows Directory.

You can get the hadoop.dll at winutils

futuredaemon
  • 169
  • 1
  • 11
  • Can I get the purpose behind adding hadoop.dll to system32 directory - adding dll to system32 works for me but I want to know why ? "https://stackoverflow.com/questions/75916651/upgrading-hadoop-from-2-10-2-to-3-3-4-got-java-lang-unsatisfiedlinkerror" – Amita Patil Apr 03 '23 at 07:20
4

Instead of using the official branch I would suggest the windows optimized

http://svn.apache.org/repos/asf/hadoop/common/branches/branch-trunk-win/

You need to compile it, build winutils.exe under windows and place it in the hadoop/bin directory

  • Could you please provide any good link with instructions on building this stuff under windows? – Aleksei Egorov Oct 23 '13 at 12:50
  • I actually built Hadoop on a linux machine, because there where problems, with ant/maven etc.. for this part I just followed the instructions provided on the website if I remember correctly. Then for building winutils you can use visual studio, also this part was straightforward, just load the project and build it – Marco Seravalli Oct 23 '13 at 14:04
4

You might need to copy hadoop.dll and winutils.exe files from hadoop-common-bin to %HADOOP_HOME%\bin Add %HADOOP_HOME%/bin to your %PATH% variable.

You can download hadoop-common from https://github.com/amihalik/hadoop-common-2.6.0-bin

Vikash Pareek
  • 1,063
  • 14
  • 30
3

I ran into same problem with Hadoop 2.4.1 on Windows 8.1; there were a few differences with the resulting solution caused mostly by the newer OS.

I first installed Hadoop 2.4.1 binary, unpacking it into %HADOOP_HOME%.

The previous answers describe how to set up Java, protobuf, cygwin, and maven, and the needed environment variables. I had to change my Platform environment variable from HP's odd 'BCD' value.

I downloaded the source from an Apache mirror, and unpacked it in a short directory (HADOOP_SRC = C:\hsrc). Maven ran fine from a standard Windows command prompt in that directory: mvn package -DskipTests.

Instead of using the Windows 7 SDK (which I could not get to load) or the Windows 8.1 SDK (which doesn't have the command line build tools), I used the free Microsoft Visual Studio Express 2013 for Windows Desktop. Hadoop's build needed the MSBuild location (C:\Program Files (x86)\MSBuild\12.0) in the PATH, and required that the various Hadoop native source projects be upgraded to the newer (MS VS 2013) format. The maven build failures were nice enough to point out the absolute path of each project as it failed, making it easy to load the project into Visual Studio (which automatically converts, after asking).

Once built, I copied the native executables and libraries into the Hadoop bin directory. They were built in %HADOOP_SRC%\hadoop-common-project\hadoop-common\target\bin, and needed to be copied into %HADOOP_HOME%\bin.

leifbennett
  • 116
  • 2
2

Adding hadoop.dll and hdfs.dll to the %HADOOP_HOME%\bin folder did the trick for me.

Kunal Kanojia
  • 253
  • 3
  • 9
1

After multiple trial and errors, I got it working with the below solution.

Windows Changes:

  1. Download the zip of winutils from https://github.com/steveloughran/winutils
  2. Extract the zip to C:\winutils.
  3. Open Windows Environment Variables screen and add the following System Variable.

HADOOP_HOME = C:\winutils\hadoop-3.0.0

  1. Under Path System Variable add

%HADOOP_HOME%\bin 5. Restart your system.

Maven Changes:

  <properties>

    <java.version>1.8</java.version>
    <maven.compiler.source>${java.version}</maven.compiler.source>
    <maven.compiler.target>${java.version}</maven.compiler.target>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>

    <scala.version>2.12</scala.version>
    <spark.version>3.0.1</spark.version>
    <hadoop.version>3.0.0</hadoop.version>  <!-- Note: HADOOP Version used is the one available for winutils -->

  </properties>

  <dependencies>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_${scala.version}</artifactId>
      <version>${spark.version}</version>
<!--  <scope>provided</scope> -->
    </dependency>

    <!-- Hadoop-->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>${hadoop.version}</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>${hadoop.version}</version>
    </dependency>

    <!-- For S3 Read (optional) -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-aws</artifactId>
      <version>${hadoop.version}</version>
    </dependency>
Dharman
  • 30,962
  • 25
  • 85
  • 135
Arjun Sunil Kumar
  • 1,781
  • 3
  • 28
  • 46
0

Just installed Hadoop 2.2.0 in my environment win7 X64.

following BUILD.txt makes me did that.Note that:the dir in the hdfs-site.xml and mapred-site.xml is starts with / like below

E.G

  <property>
<name>dfs.namenode.name.dir</name>
<value>file:/hadoop-2.2.0_1/dfs/name</value>
<description></description>
<final>true</final>

May help u!

Derry
  • 39
  • 3
0

Download & Install Java in c:/java/

make sure the path is this way, if java is installed in 'program files', then hadoop-env.cmd will not recognize java path

Download Hadoop binary distribution.

I am using binary distribution Hadoop-2.8.1. Also I would recommend to keep extraction path as short as possible

Set Environment Variables:

JAVA_HOME = "c:/Java"
HADOOP_HOME="<your hadoop home>"
Path= "JAVA_HOME/bin"
Path = "HADOOP_HOME/bin" 

Hadoop will work on windows if Hadoop-src is built using maven in your windows machine. Building the Hadoop-src(distribution) will create a Hadoop binary distribution, which will work as windows native version.

But if you don't want to do that, then download pre-builted winutils of Hadoop distribution. Here is a GitHub link, which has winutils of some versions of Hadoop.

if the version you are using is not in the list, the follow the conventional method for setting up Hadoop on windows - link

If you found your version, then copy paste all content of folder into path: /bin/

Set all the .xml configuration files - Link & set JAVA_HOME path in hadoop-env.cmd file

From cmd go to:

<HADOOP_HOME>/bin/> hdfs namenode -format
<HADOOP_HOME>/sbin> start-all.cmd

Hope this helps.

Raxit Solanki
  • 434
  • 6
  • 15
0
  1. Get Hadoop binaries (which include winutils.exe and hadoop.dll)
  2. Make sure hadoop\bin is available via PATH (System PATH if you run it as a Service)

    Note that setting java.library.path overrides PATH. If you set java.library.path, make sure it is correct and points to the hadoop library.

rustyx
  • 80,671
  • 25
  • 200
  • 267