What are the types of failure in HDFS? When NameNode, Secondary NameNode and DataNode destroy, then what happens?
2 Answers
Mainly three types of failures are NameNode failures, DataNode failures and network partitions.
and for all fail case, try sudo jps
. you will get process id and process name. Then do sudo kill -9 {process-id}
.
Then try to read/write data in hdfs or pig/hive shell.
-
Links may be broken in future. Add brief summary of content of the links. – Ravindra babu Feb 21 '16 at 07:05
Namenode failure:
Namenode is no more a single point of failure since the launch of Hadoop 2.x version.
From the documentation link, HDFSHighAvailabilityWithQJM ( Quorum Journal Manager) has been preferred. This process is explained in detail in my answers of below questions
How does Hadoop Namenode failover process works?
Hadoop namenode : Single point of failure
Secondary NameNode failure:
Secondary Namenode is replaced with StandBy Namenode is Hadoop 2.x.
It's failure does not matter since Primary Namenode is available
Datanode failure:
If your replication factor is more than 1
, datanode failure does not hurt as file blocks are available in other Datanode.
Have a look at my answer in this SE question:
From documentation page:
Each DataNode sends a Heartbeat message to the NameNode periodically. A network partition can cause a subset of DataNodes to lose connectivity with the NameNode. The NameNode detects this condition by the absence of a Heartbeat message. The NameNode marks DataNodes without recent Heartbeats as dead and does not forward any new IO requests to them. Any data that was registered to a dead DataNode is not available to HDFS any more.
DataNode death may cause the replication factor of some blocks to fall below their specified value. The NameNode constantly tracks which blocks need to be replicated and initiates replication whenever necessary. The necessity for re-replication may arise due to many reasons: a DataNode may become unavailable, a replica may become corrupted, a hard disk on a DataNode may fail, or the replication factor of a file may be increased.

- 1
- 1

- 37,698
- 11
- 250
- 211