28

I have 3 data nodes running, while running a job i am getting the following given below error ,

java.io.IOException: File /user/ashsshar/olhcache/loaderMap9b663bd9 could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1325)

This error mainly comes when our DataNode instances have ran out of space or if DataNodes are not running. I tried restarting the DataNodes but still getting the same error.

dfsadmin -reports at my cluster nodes clearly shows a lots of space is available.

I am not sure why this is happending.

Ashish Sharma
  • 1,597
  • 7
  • 24
  • 35

9 Answers9

15

I had the same issue, I was running very low on disk space. Freeing up disk solved it.

divyaravi
  • 243
  • 3
  • 7
  • Thanks for this! My one-node system was misconfigured to run from an incorrect partition and it simply didn't have the capcity in it to hold yet another file. – Jari Turkia Apr 23 '18 at 10:27
12

1.Stop all Hadoop daemons

for x in `cd /etc/init.d ; ls hadoop*` ; do sudo service $x stop ; done

2.Remove all files from /var/lib/hadoop-hdfs/cache/hdfs/dfs/name

Eg: devan@Devan-PC:~$ sudo rm -r /var/lib/hadoop-hdfs/cache/

3.Format Namenode

sudo -u hdfs hdfs namenode -format

4.Start all Hadoop daemons

for x in `cd /etc/init.d ; ls hadoop*` ; do sudo service $x start ; done

Stop All Hadoop Service

Sayat Satybald
  • 6,300
  • 5
  • 35
  • 52
Devan M S
  • 692
  • 9
  • 23
2
  1. Check whether your DataNode is running,use the command:jps.
  2. If it is not running wait sometime and retry.
  3. If it is running, I think you have to re-format your DataNode.
twlkyao
  • 14,302
  • 7
  • 27
  • 44
1

What I usually do when this happens is that I go to tmp/hadoop-username/dfs/ directory and manually delete the data and name folders (assuming you are running in a Linux environment).

Then format the dfs by calling bin/hadoop namenode -format (make sure that you answer with a capital Y when you are asked whether you want to format; if you are not asked, then re-run the command again).

You can then start hadoop again by calling bin/start-all.sh

lvella
  • 419
  • 1
  • 5
  • 11
  • This is the only solution to the OP's question that worked for me. I was trying to follow the example in [link](http://blog.tundramonkey.com/2013/02/24/setting-up-hadoop-on-osx-mountain-lion) on my Macbook osx mountain lion 10.8.5, but could not see the datanode being generated after start-all.sh, until I deleted the data and name and namesecondary folders as mentioned above. Thank you! – John Jiang Oct 03 '13 at 05:51
1

I had this problem and I solved it as bellow:

  1. Find where are your datanode and namenode metadata/data saved; if you cannot find it, simply do this command on mac to find it (there are located in a folder called "tmp")

    find /usr/local/Cellar/ -name "tmp";

    find command is like this: find <"directory"> -name <"any string clue for that directory or file">

  2. After finding that file, cd into it. /usr/local/Cellar//hadoop/hdfs/tmp

    then cd to dfs

    then using -ls command see that data and name directories are located there.

  3. Using remove command, remove them both:

    rm -R data . and rm -R name

  4. Go to bin folder and end everything if you already have not done it:

    sbin/end-dfs.sh

  5. Exit from the server or localhost.

  6. Log into the server again: ssh <"server name">

  7. start the dfs:

    sbin/start-dfs.sh

  8. Format the namenode for being sure:

    bin/hdfs namenode -format

  9. you can now use hdfs commands to upload your data into dfs and run MapReduce jobs.

Reihan_amn
  • 2,645
  • 2
  • 21
  • 21
1

In my case, this issue was resolved by opening the firewall port on 50010 on the datanodes.

fracca
  • 2,417
  • 1
  • 23
  • 22
  • 1
    can you be more specific , which protocole should I use , and the name of the programme ... – benaou mouad Mar 22 '20 at 20:34
  • Thanks. I got the same error message as the OP, while all my datanodes are healthy. It turned out that the master could not connect to those datanodes on port 50010. – Averell Mar 31 '20 at 02:34
0

Very Simple fix for the same issue on Windows 8.1
I used Windows 8.1 OS and Hadoop 2.7.2, Did the following things to overcome this issue.

  1. When I started the hdfs namenode -format, I noticed there is a lock in my directory. please refer the figure below.
    HadoopNameNode
  2. Once I deleted the full folder as shown below, and again I did the hdfs namenode -format. Folder location
    Full Folder Delete
  3. After performing above two steps, I could successfully place my required files in HDFS system. I used start-all.cmd command to start yarn and namenode.
Praveen Kumar K S
  • 3,024
  • 1
  • 24
  • 31
0

In my case the dfs.datanode.du.reserved in the hdfs-site.xml was too large as well as the name node giving out the private ip address of the data node so it could not route properly. The solution to the private ip was to switch the docker container to host network and place the hostname in the host properties of the config files.

This goes over other possibilities Stack Question on replication issue

SparkleGoat
  • 503
  • 1
  • 9
  • 22
0

The answers saying to open port numbered 50010 might only apply to an older version of Hadoop. I'm using Hadoop version 3.3.4 and the port you should open to fix this error is 9866. You need to open this port on all of the Hadoop dataNodes. Here's a code snippet you can use in RHEL 8:

sudo firewall-cmd --zone=public --permanent --add-port 9866/tcp
sudo firewall-cmd --reload
alD
  • 35
  • 4