20

I am trying to install a single node setup of Hadoop on Ubuntu. I started following the instructions on the Hadoop 2.3 docs.

But I seem to be missing something very simple.

First, it says to

To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.

Then,

Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.

However, I can't seem to find the conf directory.

I downloaded a release of 2.3 at one of the mirrors. Then unpacked the tarball, an ls of the inside returns:

$ ls
bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share

I was able to find the file they were referencing, just not in a conf directory:

$ find . -name hadoop-env.sh
./etc/hadoop/hadoop-env.sh

Am I missing something, or am I grabbing the wrong package? Or are the docs just outdated?

If so, anyone know where some more up-to date docs are?

Sanketh Katta
  • 5,961
  • 2
  • 29
  • 30

6 Answers6

13

I am trying to install a pseudo-distributed mode Hadoop, running into the same issue.

By following the book Hadoop The Definitive Guide (Third Edition), on page 618, it says:

In Hadoop 2.0 and later, MapReduce runs on YARN and there is an additional con-
figuration file called yarn-site.xml. All the configuration files should go in the
etc/hadoop subdirectory

Hope this confirms that etc/hadoop is the correct place.

6

I think the docs need to be updated. Although the directory structure has changed, file names for important files like hadoop-env.sh, core-ste.xml and hdfs-site.xml have not changed. You may find the following link useful for getting started.

http://codesfusion.blogspot.com/2013/10/setup-hadoop-2x-220-on-ubuntu.html

aasoj
  • 341
  • 1
  • 4
  • 2
    Thanks, that was a great blog post, it got me much further, but I am still hitting some issues. It is a bit absurd that the official docs are outdated for even the most basic setup. This seems to be the case for all the 2x versions. Even the current ["stable" release's docs](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleNodeSetup.html). – Sanketh Katta Mar 19 '14 at 18:49
5

In Hadoop1,

{$HADOOP_HOME}/conf/

In Hadoop2,

{$HADOOP_HOME}/etc/hadoop
Ani Menon
  • 27,209
  • 16
  • 105
  • 126
3

in Hadoop 2.7.3 the file is in hadoop-common/src/main/conf/

$ sudo find . -name hadoop-env.sh
./hadoop-2.7.3-src/hadoop-common-project/hadoop-common/src/main/conf/hadoop-env.sh
javaProgrammer
  • 1,185
  • 1
  • 10
  • 11
2

Just adding a note on the blog post http://codesfusion.blogspot.com/2013/10/setup-hadoop-2x-220-on-ubuntu.html. The blogpost is fantastic and very useful. That's how I got started. One aspect that I took a little time to figure is, that this blog seems to use a simplified way of providing configuration in the hadoop conf files such as "conf/core-site.xml", hdfs-site.xml etc... as follows

<!--fs.default.name is the name node URI -->
<configuration>
    fs.default.name
    hdfs://localhost:9000
</configuration>

As per official docs there is a more rigorous way - that would be useful when you have more than one properties is to add it as follows ( please note - the description is optional :-) )

<configuration>
    <property>
    <name> fs.default.name </name>
    <value>hdfs://localhost:9000 </value>
    <description>the name node URI </description>
    </property>
    <!--Add more configuration properties here -->
</configuration>
Yogesh Devi
  • 617
  • 11
  • 30
0

The conf directory for Hadoop's (2022) version 3.3.1 is located in src/main directory:

$HOME/hadoop/hadoop3.3/hadoop-common-project/hadoop-common/src/main/

Daniel Ado
  • 514
  • 4
  • 7