In Hadoop 2.7.2(CentOS 7) Cluster ,Datanode starts but doesn't connect to namenode

Question

I installed a three node hadoop cluster. The master and slave node starts separately but the datanode isn't shown in namenode webUI. The log file for datanode shows the following error :

2016-06-18 21:23:53,980 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/192.168.1.100:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-06-18 21:23:55,029 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/192.168.1.100:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-06-18 21:23:56,030 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/192.168.1.100:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-06-18 21:23:57,031 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/192.168.1.100:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-06-18 21:23:58,032 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/192.168.1.100:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

namenode machine's infomation:

cat /etc/hosts

#127.0.0.1   localhost localhost.localdomain localhost4            localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6        localhost6.localdomain6
192.168.1.100 namenode
192.168.1.101 datanode1
192.168.1.102 datanode2

cat /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0
IPV6INIT=yes
BOOTPROTO=dhcp
UUID=61fe61d3-fcda-4fed-ba81-bfa767e0270a
ONBOOT=yes
TYPE=Ethernet
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME="System eth0"
BOOTPROTO="static" 
ONBOOT="yes" 
IPADDR=192.168.1.100 
GATEWAY=192.168.1.1 
NETMASK=255.255.255.0 
DNS1=192.168.1.1

cat /etc/hostname

namenode

cat core-site.xml

<configuration>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/tmp</value>
    <description>Abase for other temporary directories.</description>
</property>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://namenode:9000</value>
</property>
<property>
    <name>io.file.buffer.size</name>
    <value>4096</value>
</property>

cat hdfs-site.xml

<configuration>
<property>
    <name>dfs.nameservices</name>
    <value>hadoop-cluster1</value>
</property>
<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>namenode:50090</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///home/hadoop/dfs/name</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///home/hadoop/dfs/data</value>
</property>
<property>
    <name>dfs.replication</name>
    <value>2</value>
</property>
<property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
</property>

cat mapred-site.xml

<configuration>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
<property>
    <name>mapreduce.jobtracker.http.address</name>
    <value>namenode:50030</value>
</property>
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>namenode:10020</value>
</property>
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>namenode:19888</value>
</property>

cat yarn-site.xml

<configuration>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.resourcemanager.address</name>
    <value>namenode:8032</value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>namenode:8030</value>
</property>
<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>namenode:8031</value>
</property>
<property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>namenode:8033</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>namenode:8088</value>
</property>

cat slaves

datanode1
datanode2

And what is the hosts file for both datanodes? You've only shown information about the namenode — OneCricketeer, Jun 18 '16 at 14:07
Have you distributed SSH keys and verified both directions of SSH access between all nodes (including to themselves)? — OneCricketeer, Jun 18 '16 at 14:16
i've tried it,namenode ssh to datanode is ok,but datanode ssh to namenode need input password — sucre, Jun 18 '16 at 14:27
Seems like a problem, yeah? You'll need to setup passwordless SSH access between all nodes using some public-private key authorization — OneCricketeer, Jun 18 '16 at 14:30
You've added the correct keys to the `~/.ssh/authorized_keys` file? I think you have to do that for a common username between all the nodes (let's say `hduser`) as well as `root`... Long story short, setting this up correctly by hand is pain :) You're welcome to look into Ambari or Cloudera Manager — OneCricketeer, Jun 18 '16 at 14:57
now,one namenode and two datanode,everyone has the same authorized_keys file — sucre, Jun 18 '16 at 15:03
I can't remember the username that is used to communicate in HDFS traffic, but I forgot to mention the keys should be for that user. And you did `root` as well? — OneCricketeer, Jun 18 '16 at 15:06
yes，three node all use the same user----root,you mean,the password which in authorized_keys is the same as root login password? — sucre, Jun 18 '16 at 15:12
I mean the `root` linux user. It might not actually be necessary, though... Umm so, hosts are good, SSH might be good, then the only other networking thing I can think of is disabling firewalls (or opening ports if you are comfortable with that) — OneCricketeer, Jun 18 '16 at 15:14
@sucre Did you format the namenode ? (hadoop namenode -format) before Starting? — charan tej, Jun 21 '16 at 10:33

score 0 · Accepted Answer · answered Jun 23 '16 at 14:08

0

the solution is

systemctl stop firewalld.service

answered Jun 23 '16 at 14:08

sucre

23
5

In Hadoop 2.7.2(CentOS 7) Cluster ,Datanode starts but doesn't connect to namenode

1 Answers1

Linked