3

I'm deploying hadoop as a multi node cluster (distributed mode). But each data node is having different different cluster id.

On slave1,

java.io.IOException: Incompatible clusterIDs in /home/pushuser1/hadoop/tmp/dfs/data: namenode clusterID = CID-c72a7d30-ec64-4e4f-9a80-e6f9b6b1d78c; datanode clusterID = CID-2ecca585-6672-476e-9931-4cfef9946c3b

On slave2,

java.io.IOException: Incompatible clusterIDs in /home/pushuser1/hadoop/tmp/dfs/data: namenode clusterID = CID-c72a7d30-ec64-4e4f-9a80-e6f9b6b1d78c; datanode clusterID = CID-e24b0548-2d8d-4aa4-9b8c-a336193c006e

I followed this link as well Datanode not starts correctly but I dont know which cluster id I should pick. If I pick any then data node starts on that machine but not on another one. And also when I format namenode using basic command (hadoop namenode - format), datanodes on each slave nodes are started but then namenode on master machine doesn't get started.

Community
  • 1
  • 1
Abhishek Maheshwari
  • 275
  • 1
  • 3
  • 12

1 Answers1

9

ClusterIDs of datanodes and namenodes should match, then only datanodes can effectively communicate with namenode. If you do namenode format new ClusterID will be assigned for namenodes then ClusterIDs in datanodes won't match.

You can locate a VERSION files in your /home/pushuser1/hadoop/tmp/dfs/data/current/ (datanode directory ) as well as namenode directory(/home/pushuser1/hadoop/tmp/dfs/name/current/ based on the value your specified for dfs.namenode.name.dir) that contains the ClusterID.

If you are ready for format your hdfs namenode, Stop all HDFS services, Clear out all files inside the following directories

rm -rf /home/pushuser1/hadoop/tmp/dfs/data/*  (Need to execute on all data nodes)
rm -rf /home/pushuser1/hadoop/tmp/dfs/name/*

and format hdfs again (hadoop namenode -format )

SachinJose
  • 8,462
  • 4
  • 42
  • 63