How to properly remove a NodeManager from a Yarn cluster having nodemanager restart recovery enabled?

Asked Jun 09 '23 at 21:11

Active Jun 10 '23 at 07:07

Viewed 48 times

We have added these configs on yarn-site.xml file of our Hadoop-Yarn cluster.

<property>
    <name>yarn.nodemanager.recovery.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.nodemanager.recovery.supervised</name>
    <value>true</value>
</property>

What's the proper way of decommissioning a NM node which have NM restart recovery feature?

The NM restart recovery feature has been working well, applications not failing even if we restart nodemanager processes. But, when we try to decommission a node by adding the node name to yarn_exclude_hosts file and refreshing nodes on resourcemanager, the applications that had containers running on that node are stuck for a long time and then fail.

edited Jun 10 '23 at 07:07

OneCricketeer

179,855
19
132
245

asked Jun 09 '23 at 21:11

Mohammad Solaiman

How to properly remove a NodeManager from a Yarn cluster having nodemanager restart recovery enabled?

0 Answers0