0

I'm having trouble setting up Hadoop. My setup consists of a nameNode VM and two seperate physical dataNodes that are connected to the same network.

IP configuration:

  • 192.168.118.212 namenode-1
  • 192.168.118.217 datanode-1
  • 192.168.118.216 datanode-2

I keep getting the error that there are 0 datanodes running, but when I do JPS on my dataNode-1 machine or dataNode-2 machine, it shows up as running. My nameNode log shows this:

File /user/hadoop/.bashrc_COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.

The logs on my dataNode-1 machine tell me that it has trouble connecting to the nameNode.

WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: namenode-1/192.168.118.212:9000

Only weird part is that it can't connect, though it can start it? I can also SSH between all of them with no problems.

So my best guess would be that I've configured the one of the config files incorrectly, though I checked other questions on here and they seem to be correct.

core-site.xml

<configuration>
<property>
    <name>fs.default.name</name>
    <value>hdfs://namenode-1:9000/</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/home/hadoop/hadoop_data/hdfs/datanode</value>
    <final>true</final>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/home/hadoop/hadoop_data/hdfs/namenode</value>
    <final>true</final>
</property>
<property>
    <name>dfs.permissions</name>
    <value>false</value>
</property>
</configuration>

mapred-site.xml

<configuration>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
<property>
    <name>mapreduce.job.tracker</name>
    <value>namenode-1:9001</value>
</property>
</configuration>
kekkodesu
  • 1
  • 1
  • 5

2 Answers2

0
  1. The problem could be the fs.default.name. Try the ip adress as fs.default.name. And check if your /etc/hosts configuration points to the correct ip address. Most likely this is correct, since your datanode figured out the ip address.

  2. The problem could also be the port number! Try 8020 or 50070 instead of 9000 and look what happens.

Cloudkollektiv
  • 11,852
  • 3
  • 44
  • 71
  • I've already tried with using only the IP, doesn't make a difference. I changed the port, but whenever I try to do "hdfs dfs -get xxx" it gives me the 0 nodes running error again – kekkodesu Oct 17 '17 at 12:31
  • One possiblity is the permissions: https://stackoverflow.com/questions/26545524/there-are-0-datanodes-running-and-no-nodes-are-excluded-in-this-operation You can also check the connection to rule this out, this is explained here: http://hadoopinrealworld.com/0-datanodes-running-and-no-nodes/ Also try a proper restart of your cluster, because it is already not working this is no problem. – Cloudkollektiv Oct 18 '17 at 13:27
  • Seems like the permissions on both the nameNode and dataNode are fine. nameNode gets resolved fine as well. – kekkodesu Oct 20 '17 at 05:57
0

The problem was the firewall. You can stop it by running systemctl stop firewalld.service

I found the answer here: https://stackoverflow.com/a/37994066/8789361

kekkodesu
  • 1
  • 1
  • 5