YARN UNHEALTHY nodes

Question

In our YARN cluster which is 80% full, we are seeing some of the yarn nodemanager's are marked as UNHEALTHY. after digging into logs I found its because disk space is 90% full for data dir. With following error

2015-02-21 08:33:51,590 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Node hdp009.abc.com:8041 reported UNHEALTHY with details: 4/4 local-dirs are bad: /data3/yarn/nm,/data2/yarn/nm,/data4/yarn/nm,/data1/yarn/nm;
2015-02-21 08:33:51,590 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: hdp009.abc.com:8041 Node Transitioned from RUNNING to UNHEALTHY

I am trying to understand how yarn marks node Unhealthy & is there any way to change the threshold ?

Thanks

score 15 · Accepted Answer · edited Apr 24 '16 at 03:19

15

try adding the property yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage to yarn-site.xml. This property specifies the maximum percentage of disk space utilization allowed after which a disk is marked as bad. Values can range from 0.0 to 100.0.

yarn-default.xml

force to health state e.g.:

<?xml version="1.0"?>
<configuration>    
  <property>
     <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
     <value>0.0</value>
  </property>
  <property>
     <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
     <value>100.0</value>
  </property>
</configuration>

edited Apr 24 '16 at 03:19

Alvaro Silvino

9,441
12
52
80

answered Mar 15 '15 at 13:56

Hamza Zafar

1,320
12
17

1

just forced my yarn to **yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage** 100.0 and **yarn.nodemanager.disk-health-checker.min-healthy-disks** 0 ... did the trick , for local proposes of course – Alvaro Silvino Apr 24 '16 at 03:16

YARN UNHEALTHY nodes

1 Answers1

Linked