2

I have setup Hadoop (YARN) on ubuntu. The resource manager appears to be running. When I run the hadoop fs -ls command, I receive the following error:

14/09/22 15:52:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: Call From ubuntu-8.abcd/xxx.xxx.xxx.xxxx to ubuntu-8.testMachine:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

I checked on the suggested URL in the error message but could not figure out how to resolve the issue. I ahve tried setting the external IP address (as opposed to localhost) in my core-site.xml file (in etc/hadoop) but that has not resolved the issue. IPv6 has been disabled on the box. I am running the process as hduser (which has read/write access to the directory). Any thoughts on fixing this? I am running this on a single node.

bashrc

#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_INSTALL=/usr/local/hadoop/hadoop-2.5.1
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export HADOOP_YARN_HOME=$HADOOP_INSTALL  ##added because I was not sure about the line below
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_CONF_DIR=$HADOOP_INSTALL/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END 
Roman Nikitchenko
  • 12,800
  • 7
  • 74
  • 110

2 Answers2

2

Your issue is not related to YARN. It is limited by HDFS usage. Here is the question with similar situation - person who asked had 9000 port listening on external IP interface but configuration was pointing to localhost. I'd advise first check if somebody at all listens on port 9000 and on what interface. Looks like you have service listening on IP interface which differs from where you look for it. Looking at your logs your client is trying ubuntu-8.testMachine:9000. To what IP it is being resolved? If it is assigned in /etc/hosts to 127.0.0.1, you could have the situation as in question I have mentioned - client tries to access 127.0.0.1 but service is waiting on external IP. OK, you could have vice versa. Here is good default port mapping table for Hadoop services.

Indeed many similar cases have the same root - wrongly configured host interfaces. People often configure their workstation hostname and assign this hostname to localhost in /etc/hosts. More, they write first short name and only after this FQDN. But this means IP is resolved into short hostname but FQDN is resolved into IP (non-symmetric).

This in turn provokes number of situations where services are started on local 127.0.0.1 interface and people have serious connectivity issues (are you surprised? :-) ).

Right approach (at least I encourage it based on expirience):

  1. Assign at least one external interface that is visible to your cluster clients. If you have DHCP and don't want to have static IP, please bind your IP to MAC but move to 'constant' IP value.
  2. Write local hostname into /etc/hosts to match external interface. FQDN name first and then short.
  3. If you can, make your DNS resolver to resolve your FQDN into your IP. Don't care about short name.

Example, you have external IP interface 1.2.3.4 and FQDN (fully qualified domain name) set to myhost.com - in this case your /etc/hosts record MUST look like:

1.2.3.4 myhost.com myhost

And yes, it's better your DNS resolver knows about your name. Check both direct and reverse resolution with:

host myhost.com host 1.2.3.4

Yes, clustering is not so easy in term of networking administration ;-). Never has been and shall never be.

Community
  • 1
  • 1
Roman Nikitchenko
  • 12,800
  • 7
  • 74
  • 110
  • thank you for sharing - appreciated. If this were the case, why was I able to run the services by name (unless their scripts had different logic pertaining to host names) –  Sep 23 '14 at 20:23
  • 1
    Some services use binding interface resolution based on hostname but some not. And clients act the same way. Yet it may be not your case - you need to check but I have already seen number of cases similar to this. – Roman Nikitchenko Sep 23 '14 at 22:47
1

Be sure you that you had started all the necesary, type start-all.sh, this command will start all the services needed for the connection to hadoop.

After that, you can type jps, with this command you can see all the services running under hadoop, and at the end, check the ports opened of these services with netstat -plnet | grep java.

Hope this solve your issue.

Kaiser
  • 21
  • 1
  • 5
  • thanks for sharing. I normally try "start-dfs.sh" followed by "start-yarn.sh" - but this only launches the resource manager - the namenode does not come up. I observe the same via the start-all.sh script. I have setup $HADOOP_INSTALL/bin & $HADOOP_INSTALL/sbin in my path ($HADOOP_INSTALL points to the hadoop directory) - wondering if that is causing an issue (because every time I run start-dfs.sh, it complains about not locating $HADOOP_INSTALL/bin/hdfs even the file is there with read/write access –  Sep 22 '14 at 23:35
  • The reasons I was usually getting thins exception were:1. NodeManager is not running, or resourceManager is not running. 2. The core-site.xml file is not well configured, the property fs.defaultFS must have something like this: hdfs://ipAddress | localhost:8020 and the mapred.job.tracker in the mapred-side.xml file, configured accordingly. Also that error you are getting when running start-dfs.sh is nor normal, so check the environment variables. – Kaiser Sep 22 '14 at 23:52
  • yes, the issue is that the namenode does not come up. I copied my bashrc above in the original post - please comment if you observe anything wrong –  Sep 23 '14 at 00:03
  • thanks - either the script for start-dfs is not completely correct or most likely I have misconfigured something in my bashrc. I can run the namenode, datanode, namenodemanager and jobhistorymanager directly as per the the other answer –  Sep 23 '14 at 00:24
  • try removing datanode and namenode directory by default it gets created at below path /tmp/hadoop-username* and then format the namenode and then try – user3484461 Sep 23 '14 at 06:44