difference between the Secondary NameNode and the Checkpoint Node

Question

The Checkpoint Node fetches periodically fsimage and edits from the NameNode and merges them. The resulting state is called checkpoint. After this is uploads the result to the NameNode.

Is the checkpoint name node being used in Hadoop 2.x version? If yes, is the Secondary Name node still needed?

Also how does the checkpoint name node work when there are multiple Name nodes in Hadoop version 2?

Could anyone clarify these confusing concepts?

score 1 · Answer 1 · edited May 23 '17 at 10:34

Have a look at this SE question for more details on responsibilities of each node:

Hadoop 2.0 Name Node, Secondary Node and Checkpoint node for High Availability

You don't have to configure Secondary Name node and Checkpoint Node in Hadoop 2.0

Instead you need Active Name node and Standby Name node for High availability as per documentation page

In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state.

The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.

In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate with a group of separate daemons called “JournalNodes” (JNs).

When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. The Standby node is capable of reading the edits from the JNs, and is constantly watching them for changes to the edit log.

As the Standby Node sees the edits, it applies them to its own namespace. In the event of a failover, the Standby will ensure that it has read all of the edits from the JounalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.

In order to provide a fast failover, it is also necessary that the Standby node have up-to-date information regarding the location of blocks in the cluster. In order to achieve this, the DataNodes are configured with the location of both NameNodes, and send block location information and heartbeats to both.

Refer to related SE questions for more details:

How does Hadoop Namenode failover process works?

difference between the Secondary NameNode and the Checkpoint Node

1 Answers1