Hadoop CDH. File could only be replicated to 0 nodes instead of minReplication (=1)

Question

I have issue with cluster of 72 machines. 60 of them are HOT storage and 12 are COLD. When I'm trying to put data into COLD Hive tables sometimes I got an error:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hive/warehouse/test.db/rawlogs/dt=2016-01-31/.hive-staging_hive_2016-06-29_12-54-09_949_6553181118480369018-1/_task_tmp.-ext-10002/_tmp.001029_3 could only be replicated to 0 nodes instead of minReplication (=1).  There are 71 datanode(s) running and no node(s) are excluded in this operation.

There are a lot of free space on both host FS and HDFS.

Configured Capacity | Capacity Used | Capacity Remaining | Block Pool Used

ARCHIVE 341.65 TB 56.64 TB (16.58%) 267.65 TB (78.34%) 56.64 TB

DISK 418.92 TB 247.78 TB (59.15%) 148.45 TB (35.44%) 247.78 TB

I have 4 racks defined for COLD servers.

Rack: /50907 1 node

Rack: /50912 1 node

Rack: /50917 1 node

Rack: /80104 9 nodes

It's a working cluster and I can't just cleanup all data as suggested in similar issue on stackoverflow.

Update. I decided to deploy renewed topology script across all servers in cluster. After deploying I did restart for all hadoop daemons on every node including namenode, but dfsadmin -showTopology shows the old scheme. What I need to do for renewing cluster topology? Maybe drop some kind of cache etc.

score 0 · Answer 1 · edited May 23 '17 at 11:52

0

Please check out and try to eliminate all the 8 possible root-causes/conditions as mentioned here - https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo. Only if all these conditions are ruled out and if you are still unable to solve this problem on your own, you should take a look at the following approach.

CAUTION: The namenode formatting will destroy data on HDFS!!.

Steps to solve this issue are already documented here, here and here. Hence, I am giving only the high level steps without exact commands as those can be found in the aforementioned links to avoid duplication.

Stop all Hadoop daemons
Remove relevant temp files (refer aforementioned links)
Format Namenode
Start all Hadoop daemons

edited May 23 '17 at 11:52

Community

1
1

answered Jul 01 '16 at 05:08

janeshs

793
2
12
26

@janeshs-- what will happen to existing data if you format the namenode?? – Farooque Jul 01 '16 at 08:17
@Farooque - This issue unfortunately needs the formatting to be done in many cases observed. – janeshs Jul 01 '16 at 08:57
Thanks for suggestions! Unfortunately all of 8 conditions on wiki are not related to my issue, Personally I think that I need to change network topology for COLD datanodes. As you can see there are 4 racks and 3 of them includes ony 1 node. – Samriang Jul 02 '16 at 13:33

Hadoop CDH. File could only be replicated to 0 nodes instead of minReplication (=1)

1 Answers1