I am working on AWS EMR.
I want to get the information of died task node as soon as possible. But as per default setting in hadoop, heartbeat is shared after every 10 minutes.
This is the default key-value pair in mapred-default - mapreduce.jobtracker.expire.trackers.interval : 600000ms
I tried to modify default value to 6000ms using - this link
After that, whenever I terminate any ec2 machine from EMR cluster, I am not able to see state change that fast.(in 6 seconds)
Resource manager REST API - http://MASTER_DNS_NAME:8088/ws/v1/cluster/nodes
Questions-
- What is the command to check the mapreduce.jobtracker.expire.trackers.interval value in running EMR cluster(Hadoop cluster)?
- Is this the right key I am using to get the state change ? If it is not, please suggest any other solution.
- What is the difference between DECOMMISSIONING vs DECOMMISSIONED vs LOST state of nodes in Resource manager UI ?
Update
I tried numbers of times, but it is showing ambiguous behaviour. Sometimes, it moved to DECOMMISSIONING/DECOMMISIONED state, and sometime it directly move to LOST state after 10 minutes.
I need a quick state change, so that I can trigger some event.
Here is my sample code -
List<Configuration> configurations = new ArrayList<Configuration>();
Configuration mapredSiteConfiguration = new Configuration();
mapredSiteConfiguration.setClassification("mapred-site");
Map<String, String> mapredSiteConfigurationMapper = new HashMap<String, String>();
mapredSiteConfigurationMapper.put("mapreduce.jobtracker.expire.trackers.interval", "7000");
mapredSiteConfiguration.setProperties(mapredSiteConfigurationMapper);
Configuration hdfsSiteConfiguration = new Configuration();
hdfsSiteConfiguration.setClassification("hdfs-site");
Map<String, String> hdfsSiteConfigurationMapper = new HashMap<String, String>();
hdfsSiteConfigurationMapper.put("dfs.namenode.decommission.interval", "10");
hdfsSiteConfiguration.setProperties(hdfsSiteConfigurationMapper);
Configuration yarnSiteConfiguration = new Configuration();
yarnSiteConfiguration.setClassification("yarn-site");
Map<String, String> yarnSiteConfigurationMapper = new HashMap<String, String>();
yarnSiteConfigurationMapper.put("yarn.resourcemanager.nodemanagers.heartbeat-interval-ms", "5000");
yarnSiteConfiguration.setProperties(yarnSiteConfigurationMapper);
configurations.add(mapredSiteConfiguration);
configurations.add(hdfsSiteConfiguration);
configurations.add(yarnSiteConfiguration);
This is the settings that I changed into AWS EMR (internally Hadoop) to reduce the time between state change from RUNNING to other state(DECOMMISSIONING/DECOMMISIONED/LOST).