39

I am trying to setup a single-node Hadoop 2.6.0 cluster on my PC.

On visiting http://localhost:8088/cluster, I find that my node is listed as an "unhealthy node".

In the health report, it provides the error:

1/1 local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir; 
1/1 log-dirs are bad: /usr/local/hadoop/logs/userlogs

What's wrong?

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Ra41P
  • 744
  • 1
  • 9
  • 18
  • 1
    This won't fix the root cause, but will get you going for the time being: Add property 'yarn.nodemanager.disk-health-checker.min-healthy-disks' in yarn-site.xml and set value to 0. – Tushar Sudake Jun 02 '15 at 15:43

8 Answers8

70

The most common cause of local-dirs are bad is due to available disk space on the node exceeding yarn's max-disk-utilization-per-disk-percentage default value of 90.0%.

Either clean up the disk that the unhealthy node is running on, or increase the threshold in yarn-site.xml

<property>
  <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
  <value>98.5</value>
</property>

Avoid disabling disk check, because your jobs may failed when the disk eventually run out of space, or if there are permission issues. Refer to the yarn-site.xml Disk Checker section for more details.

FSCK

If you suspect there is filesystem error on the directory, you can check by running

hdfs fsck /tmp/hadoop-hduser/nm-local-dir
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Hanxue
  • 12,243
  • 18
  • 88
  • 130
  • is it ok to store fs on /tmp? – Stepan Yakovenko Sep 07 '18 at 01:23
  • No, not too much free space @Dims. The way I read that was "exceeded max-utilization" so that means `too much` disk space is being used. (The amount being used is above the allowed amount--threshold.) – Zargold Nov 05 '18 at 15:42
9

Please try to add the config in yarn-site.xml

<property>
   <name>yarn.nodemanager.disk-health-checker.enable</name>
   <value>false</value>
</property>

It can work on my site.

And rm the /usr/local/hadoop/logs. ex:

rm -rf /usr/local/hadoop/logs
mkdir -p /usr/local/hadoop/logs
Raul Luna
  • 1,945
  • 1
  • 17
  • 26
Owen
  • 107
  • 3
  • Well, I've tried a multitude of suggestions, including yours. it seems to be working now. I'm not sure which suggestion correctly resolved the issue though. – Ra41P Jun 13 '15 at 07:54
  • 1
    @Ra41P The last one only removes the log files, which should not affect the process, so it has to be adding the configuration – Gerard Dec 21 '15 at 13:40
  • even if hadoop finds out that your system is running out of disk space while trying to write to the logs folder the problem can be everywhere! `du -h` might help you to reveal the folders in question. In our case it wasn't the logs that ran full but some journaling files in some totally different folders. – Udo Mar 11 '19 at 17:48
  • 2
    You should not disable the disk health check. If you let this problem go, you're disks are just going to fill up 100% before much longer and you're going to crash anyways. – Nathan Loyer May 20 '19 at 14:51
3

It can be also caused by the wrong log directory location configured by yarn.nodemanager.log-dirs in yarn-site.xml. Either by the fact directory does not exist or has wrong permissions set.

kokosing
  • 5,251
  • 5
  • 37
  • 50
3

I had similar issue at first.

Then I also found another problem. When I used jps command some processes like NameNode, DataNode etc. were missing.

$jps
13696 Jps
12949 ResourceManager
13116 NodeManager

Then I fixed it from the following solution and the unhealthy node issue was automatically fixed.

Nazmul Haque
  • 196
  • 7
1

On macOS with Hadoop installed using brew I had to change /usr/local/Cellar/hadoop/2.8.1/libexec/etc/hadoop/yarn-site.xml to include the following:

<property>
  <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
  <value>0</value>
</property>

The setting has basically turned the disk health check off completely

I found the file using brew list hadoop.

$ brew list hadoop | grep yarn-site.xml
/usr/local/Cellar/hadoop/2.8.1/libexec/etc/hadoop/yarn-site.xml
/usr/local/Cellar/hadoop/2.8.1/libexec/share/hadoop/tools/sls/sample-conf/yarn-site.xml
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
0

I had a similar problem, sqoop upload just hanged when hdfs reached 90%. After I changed a treshold for max-disk-utilization-per-disk-percentage and alarm treshold definitions upload is working again. Thanks

mates
  • 1
  • 4
0

I experienced this when the disk is 90% (using >df) and I take off unnecessary files so it became 85% (the default setting for yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage is using 90% of available disk if you do not specify in yarn-site.xml) and the problem is solved.

The effect is similar to increase utilization to over 90% (so to squeeze extra available space in my case was 90% full) just to squeeze extra space. However it is good practice not to reach over 90% anyway.

r poon
  • 633
  • 7
  • 7
0

Had same issue, list my causes, FYR:

  1. dirs not exists, mkdir first,
  2. memory-mb set is too larger than available
    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/tmp/yarn/nm</value>
    </property>
    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/tmp/yarn/container-logs</value>
    </property>

    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>364000</value>
    </property>
Till
  • 1,097
  • 13
  • 13