Error on starting HDFS daemons on hadoop Multinode cluster

Question

Issue While Hadoop multi-node set-up .As soon as i start My hdfs demon on Master (bin/start-dfs.sh)

i did got below logs on Master

starting namenode, logging to /home/hduser/hadoop/libexec/../logs/hadoop-hduser-namenode-localhost.localdomain.out
slave: Warning: $HADOOP_HOME is deprecated.
slave:
slave: starting datanode, logging to /home/hduser/hadoop/libexec/../logs/hadoop-hduser-datanode-localhost.localdomain.out
master: Warning: $HADOOP_HOME is deprecated.
master:
master: starting datanode, logging to /home/hduser/hadoop/libexec/../logs/hadoop-hduser-datanode-localhost.localdomain.out
master: Warning: $HADOOP_HOME is deprecated.
master:
master: starting secondarynamenode, logging to /home/hduser/hadoop/libexec/../logs/hadoop-hduser-secondarynamenode-localhost.localdomain.out

i did got below logs on slave @

hadoop-hduser-datanode-localhost.localdomain.log file

can some advise me , whats the wrong with set-up .

2013-07-24 12:10:59,373 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.0.1:54310. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-24 12:11:00,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.0.1:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-24 12:11:00,377 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to master/192.168.0.1:54310 failed on local exception: java.net.NoRouteToHostException: No route to host
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
        at org.apache.hadoop.ipc.Client.call(Client.java:1112)

Tariq · Accepted Answer · 2013-07-24T09:38:46.037

1

Make sure your NameNode is running fine. If it is already running see if there is any problem in the connection. Your DataNode is not able to talk to the NameNode. Make sure you have added the IP and hostname of the machine in the /etc/hosts file of your slave. Try telnet to 192.168.0.1:54310 and see whether you are able to connect or not.

Showing us the NN logs would be helpful.

Edit :

See what the wiki has to say about this problem : You get a TCP No Route To Host Error -often wrapped in a Java IOException, when one machine on the network does not know how to send TCP packets to the machine specified.

Some possible causes (not an exclusive list):

The hostname of the remote machine is wrong in the configuration files.
The client's host table //etc/hosts has an invalid IPAddress for the target host.
The DNS server's host table has an invalid IPAddress for the target host.
The client's routing tables (In Linux, iptables) are wrong.
The DHCP server is publishing bad routing information.
Client and server are on different subnets, and are not set up to talk to each other. This may be an accident, or it is to deliberately lock down the Hadoop cluster.
The machines are trying to communicate using IPv6. Hadoop does not currently support IPv6
The host's IP address has changed but a long-lived JVM is caching the old value. This is a known problem with JVMs (search for "java negative DNS caching" for the details and solutions).

The quick solution: restart the JVMs.

These are all network configuration/router issues. As it is your network, only you can find out and track down the problem.

edited Jul 24 '13 at 09:38

answered Jul 24 '13 at 08:57

Tariq

34,076
8
57
79

I did a JPS command over Master and Fund [root@localhost conf]# jps 3359 DataNode 3744 Jps 3242 NameNode 3500 SecondaryNameNode They are up , but no service running on slave [hduser@localhost logs]$ jps 4384 Jps – Surya Jul 24 '13 at 09:16
cannot connect to telnet , i did tried to telnet from slave to master [hduser@localhost logs]$ telnet 192.168.0.1:54310 telnet: 192.168.0.1:54310: Name or service not known 192.168.0.1:54310: Unknown host – Surya Jul 24 '13 at 09:18
1

Looks like some network related issue. Are you able to ssh?Make sure machines are connected properly. Also, make sure all the daemons are running on all the machines. – Tariq Jul 24 '13 at 09:21
i did add slave and master info in hosts files of both the machines – Surya Jul 24 '13 at 09:24
1

Are you able to ssh from slave to master? – Tariq Jul 24 '13 at 09:26
SSH is working Fine [hduser@localhost ~]$ ssh 192.168.0.1 hduser@192.168.0.1's password: Last login: Tue Jul 23 17:34:58 2013 from slave ; can connect from slave to Master ; Regarding all the daemons are running, do we need to start them manually on all the machines, hdfs and Mapreduce demons on slave are not running – Surya Jul 24 '13 at 09:27
1

No. Ideally, if you have ssh configured, start-dfs.sh will start all the processes on the all the machines. Please see the edited answer. – Tariq Jul 24 '13 at 09:38
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/34056/discussion-between-user2499617-and-tariq) – Surya Jul 24 '13 at 10:01
Its an port issue on master, this is resolved thanks a ton tariq – Surya Jul 24 '13 at 11:58
1

Oh..try to dig..search over the net..if you still face the issue post a question :) – Tariq Jul 24 '13 at 15:02
just now i raised the issue , http://stackoverflow.com/q/17837871/2499617 , your advise is really valuable tariq , thanks man – Surya Jul 24 '13 at 15:04
Kind of late, but I just had this issue and solved it by looking at /etc/hosts on all of my machines. It turns out that the ipaddress was wrong for datanode2. It was wrong after a hard reboot of all 4 dns and the two namenodes. I feel like there should be some tool to make sure this error doesn't happen. I could ping all the computers by name, so I feel like there should be an automated process to update /etc/hosts. Is there? – Jan 21 '17 at 03:00

Error on starting HDFS daemons on hadoop Multinode cluster

1 Answers1

Linked