Writing to HDFS could only be replicated to 0 nodes instead of minReplication (=1)

Question

I have 3 data nodes running, while running a job i am getting the following given below error ,

java.io.IOException: File /user/ashsshar/olhcache/loaderMap9b663bd9 could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1325)

This error mainly comes when our DataNode instances have ran out of space or if DataNodes are not running. I tried restarting the DataNodes but still getting the same error.

dfsadmin -reports at my cluster nodes clearly shows a lots of space is available.

I am not sure why this is happending.

Make sure the `dfs.datanode.address` port address is open. I had a similar error happen to me and it turned out that out of the several ports I needed to open, I neglected `50010`. — Jake Z, Dec 10 '13 at 17:42
Thanks @MarkW, that was my mistake too. Care to add this as an answer? — Daniel Darabos, Jan 07 '15 at 12:12

score 15 · Answer 1 · answered Apr 06 '15 at 18:04

15

I had the same issue, I was running very low on disk space. Freeing up disk solved it.

answered Apr 06 '15 at 18:04

divyaravi

243
3
7

Thanks for this! My one-node system was misconfigured to run from an incorrect partition and it simply didn't have the capcity in it to hold yet another file. – Jari Turkia Apr 23 '18 at 10:27

score 12 · Accepted Answer · edited May 22 '15 at 17:18

12

1.Stop all Hadoop daemons

for x in `cd /etc/init.d ; ls hadoop*` ; do sudo service $x stop ; done

2.Remove all files from /var/lib/hadoop-hdfs/cache/hdfs/dfs/name

Eg: devan@Devan-PC:~$ sudo rm -r /var/lib/hadoop-hdfs/cache/

3.Format Namenode

sudo -u hdfs hdfs namenode -format

4.Start all Hadoop daemons

for x in `cd /etc/init.d ; ls hadoop*` ; do sudo service $x start ; done

Stop All Hadoop Service

edited May 22 '15 at 17:18

Sayat Satybald

6,300
5
35
52

answered Jul 05 '14 at 07:06

Devan M S

692
9
23

9

I run into the same problem, would you please explain why should I do this to solve the problem and if the data would be lost? – UnixAgain Oct 08 '16 at 08:28
5

This ain't no solution. -1 – pavel_orekhov Feb 06 '19 at 17:27

score 2 · Answer 3 · answered Dec 14 '13 at 08:59

2

Check whether your DataNode is running,use the command:jps.
If it is not running wait sometime and retry.
If it is running, I think you have to re-format your DataNode.

answered Dec 14 '13 at 08:59

twlkyao

14,302
7
27
44

score 1 · Answer 4 · answered Mar 22 '13 at 13:46

1

What I usually do when this happens is that I go to tmp/hadoop-username/dfs/ directory and manually delete the data and name folders (assuming you are running in a Linux environment).

Then format the dfs by calling bin/hadoop namenode -format (make sure that you answer with a capital Y when you are asked whether you want to format; if you are not asked, then re-run the command again).

You can then start hadoop again by calling bin/start-all.sh

answered Mar 22 '13 at 13:46

lvella

419
1
5
11

This is the only solution to the OP's question that worked for me. I was trying to follow the example in [link](http://blog.tundramonkey.com/2013/02/24/setting-up-hadoop-on-osx-mountain-lion) on my Macbook osx mountain lion 10.8.5, but could not see the datanode being generated after start-all.sh, until I deleted the data and name and namesecondary folders as mentioned above. Thank you! – John Jiang Oct 03 '13 at 05:51

score 1 · Answer 5 · answered Sep 20 '17 at 03:39

I had this problem and I solved it as bellow:

Find where are your datanode and namenode metadata/data saved; if you cannot find it, simply do this command on mac to find it (there are located in a folder called "tmp")

find /usr/local/Cellar/ -name "tmp";

find command is like this: find <"directory"> -name <"any string clue for that directory or file">
After finding that file, cd into it. /usr/local/Cellar//hadoop/hdfs/tmp

then cd to dfs

then using -ls command see that data and name directories are located there.
Using remove command, remove them both:

rm -R data . and rm -R name
Go to bin folder and end everything if you already have not done it:

sbin/end-dfs.sh
Exit from the server or localhost.
Log into the server again: ssh <"server name">
start the dfs:

sbin/start-dfs.sh
Format the namenode for being sure:

bin/hdfs namenode -format
you can now use hdfs commands to upload your data into dfs and run MapReduce jobs.

score 1 · Answer 6 · answered Feb 28 '19 at 10:27

1

In my case, this issue was resolved by opening the firewall port on 50010 on the datanodes.

answered Feb 28 '19 at 10:27

fracca

2,417
1
23
22

1

can you be more specific , which protocole should I use , and the name of the programme ... – benaou mouad Mar 22 '20 at 20:34
Thanks. I got the same error message as the OP, while all my datanodes are healthy. It turned out that the master could not connect to those datanodes on port 50010. – Averell Mar 31 '20 at 02:34

score 0 · Answer 7 · answered Jun 17 '16 at 13:58

Very Simple fix for the same issue on Windows 8.1
I used Windows 8.1 OS and Hadoop 2.7.2, Did the following things to overcome this issue.

When I started the hdfs namenode -format, I noticed there is a lock in my directory. please refer the figure below.
Once I deleted the full folder as shown below, and again I did the hdfs namenode -format.
After performing above two steps, I could successfully place my required files in HDFS system. I used start-all.cmd command to start yarn and namenode.

could please explain more your steps – benaou mouad Mar 22 '20 at 19:50 — benaou mouad, Mar 22 '20 at 19:50

score 0 · Answer 8 · answered Jan 15 '21 at 16:51

In my case the dfs.datanode.du.reserved in the hdfs-site.xml was too large as well as the name node giving out the private ip address of the data node so it could not route properly. The solution to the private ip was to switch the docker container to host network and place the hostname in the host properties of the config files.

This goes over other possibilities Stack Question on replication issue

score 0 · Answer 9 · answered Oct 26 '22 at 02:29

The answers saying to open port numbered 50010 might only apply to an older version of Hadoop. I'm using Hadoop version 3.3.4 and the port you should open to fix this error is 9866. You need to open this port on all of the Hadoop dataNodes. Here's a code snippet you can use in RHEL 8:

sudo firewall-cmd --zone=public --permanent --add-port 9866/tcp
sudo firewall-cmd --reload

Writing to HDFS could only be replicated to 0 nodes instead of minReplication (=1)

9 Answers9

Linked