Why does Hadoop report "Unhealthy Node local-dirs and log-dirs are bad"?

Question

I am trying to setup a single-node Hadoop 2.6.0 cluster on my PC.

On visiting http://localhost:8088/cluster, I find that my node is listed as an "unhealthy node".

In the health report, it provides the error:

1/1 local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir; 
1/1 log-dirs are bad: /usr/local/hadoop/logs/userlogs

What's wrong?

This won't fix the root cause, but will get you going for the time being: Add property 'yarn.nodemanager.disk-health-checker.min-healthy-disks' in yarn-site.xml and set value to 0. — Tushar Sudake, Jun 02 '15 at 15:43

score 70 · Accepted Answer · edited Apr 04 '18 at 10:01

70

The most common cause of local-dirs are bad is due to available disk space on the node exceeding yarn's max-disk-utilization-per-disk-percentage default value of 90.0%.

Either clean up the disk that the unhealthy node is running on, or increase the threshold in yarn-site.xml

<property>
  <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
  <value>98.5</value>
</property>

Avoid disabling disk check, because your jobs may failed when the disk eventually run out of space, or if there are permission issues. Refer to the yarn-site.xml Disk Checker section for more details.

FSCK

If you suspect there is filesystem error on the directory, you can check by running

hdfs fsck /tmp/hadoop-hduser/nm-local-dir

edited Apr 04 '18 at 10:01

Jacek Laskowski

72,696
27
242
420

answered Mar 09 '16 at 04:52

Hanxue

12,243
18
88
130

is it ok to store fs on /tmp? – Stepan Yakovenko Sep 07 '18 at 01:23
No, not too much free space @Dims. The way I read that was "exceeded max-utilization" so that means `too much` disk space is being used. (The amount being used is above the allowed amount--threshold.) – Zargold Nov 05 '18 at 15:42

score 9 · Answer 2 · edited Oct 01 '18 at 19:51

9

Please try to add the config in yarn-site.xml

<property>
   <name>yarn.nodemanager.disk-health-checker.enable</name>
   <value>false</value>
</property>

It can work on my site.

And rm the /usr/local/hadoop/logs. ex:

rm -rf /usr/local/hadoop/logs
mkdir -p /usr/local/hadoop/logs

edited Oct 01 '18 at 19:51

Raul Luna

1,945
1
17
26

answered Mar 19 '15 at 11:12

Owen

107
3

Well, I've tried a multitude of suggestions, including yours. it seems to be working now. I'm not sure which suggestion correctly resolved the issue though. – Ra41P Jun 13 '15 at 07:54
1

@Ra41P The last one only removes the log files, which should not affect the process, so it has to be adding the configuration – Gerard Dec 21 '15 at 13:40
even if hadoop finds out that your system is running out of disk space while trying to write to the logs folder the problem can be everywhere! `du -h` might help you to reveal the folders in question. In our case it wasn't the logs that ran full but some journaling files in some totally different folders. – Udo Mar 11 '19 at 17:48
2

You should not disable the disk health check. If you let this problem go, you're disks are just going to fill up 100% before much longer and you're going to crash anyways. – Nathan Loyer May 20 '19 at 14:51

score 3 · Answer 3 · answered Apr 21 '16 at 06:51

3

It can be also caused by the wrong log directory location configured by yarn.nodemanager.log-dirs in yarn-site.xml. Either by the fact directory does not exist or has wrong permissions set.

answered Apr 21 '16 at 06:51

kokosing

5,251
5
37
50

score 3 · Answer 4 · answered Jun 25 '18 at 19:30

I had similar issue at first.

Then I also found another problem. When I used jps command some processes like NameNode, DataNode etc. were missing.

$jps
13696 Jps
12949 ResourceManager
13116 NodeManager

Then I fixed it from the following solution and the unhealthy node issue was automatically fixed.

score 1 · Answer 5 · answered Oct 22 '17 at 19:26

On macOS with Hadoop installed using brew I had to change /usr/local/Cellar/hadoop/2.8.1/libexec/etc/hadoop/yarn-site.xml to include the following:

<property>
  <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
  <value>0</value>
</property>

The setting has basically turned the disk health check off completely

I found the file using brew list hadoop.

$ brew list hadoop | grep yarn-site.xml
/usr/local/Cellar/hadoop/2.8.1/libexec/etc/hadoop/yarn-site.xml
/usr/local/Cellar/hadoop/2.8.1/libexec/share/hadoop/tools/sls/sample-conf/yarn-site.xml

score 0 · Answer 6 · answered May 09 '16 at 09:15

0

I had a similar problem, sqoop upload just hanged when hdfs reached 90%. After I changed a treshold for max-disk-utilization-per-disk-percentage and alarm treshold definitions upload is working again. Thanks

answered May 09 '16 at 09:15

mates

1
4

score 0 · Answer 7 · answered Oct 27 '18 at 10:20

I experienced this when the disk is 90% (using >df) and I take off unnecessary files so it became 85% (the default setting for yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage is using 90% of available disk if you do not specify in yarn-site.xml) and the problem is solved.

The effect is similar to increase utilization to over 90% (so to squeeze extra available space in my case was 90% full) just to squeeze extra space. However it is good practice not to reach over 90% anyway.

score 0 · Answer 8 · answered May 13 '21 at 07:23

Had same issue, list my causes, FYR:

dirs not exists, mkdir first,
memory-mb set is too larger than available

    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/tmp/yarn/nm</value>
    </property>
    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/tmp/yarn/container-logs</value>
    </property>

    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>364000</value>
    </property>

Why does Hadoop report "Unhealthy Node local-dirs and log-dirs are bad"?

8 Answers8

FSCK

Linked