HDFS and Mapreduce are the core components in Hadoop, In the earlier Apache Hadoop releases, Namenode and Jobtracker were SPOF (Only one instance can be configured). This problem is fixed from Hadoop 2.X.
Jobtracker HA.
Jobtracker HA can be achived by configuring 2 Jobtracker(JT)
instance in Active - Standby
mode on two nodes. If one JT goes down, second Jobtracker will be available to serve the request. Only one jobtracker(Active) will be available for serving request at a time, second JT(Standby ) will be running in Read only mode. Jobtracker HA requires zookeeper instance, Failure over(switching) can be configured as either Manaul or Automcatic. Automatic failover requires another process called Failover Controller (FC)
. In the current release, if active JT fails, all running jobs will be halted, However new job will automatically be submitted to new JT. This functionality is not available in the current release.
MR2
is the second Generation of mapreduce which uses YARN, Resource Manager(RM)
is the master service in YARN, RM can also be configured in Active-Standby mode. RM failure will not impact running Jobs/Application.
Namenode HA
Namenode HA is something important. Namenode HA can also be configured in Active-Standby mode(Maximum 2 namenode instances). Quorum based Journaling
is the Widely accepted method, which internally uses zookeeper. Only one namenode will be active at a time.
Secondary Namenode(SNN)
is not a Standby Namenode(SN)
and vice versa, SNN has a different functionaly in Non HA configuration, Namenode HA set up doesn't require SNN, as SN namenode performs checkpointing (Functionality of SNN)
Processes Namenode HA
- Active namenode
- Standby namenode
- Failover controller : For Fencing to avoid split-brain scenario.
- Jounalnodes ( Min 3 instances required) : Namespace modfication will
be logged to Journal nodes and Standby namenode reads from there. Only one namenode will be allowed to write at a time inorder to avoide split-brain issue.