I have issue with cluster of 72 machines. 60 of them are HOT storage and 12 are COLD. When I'm trying to put data into COLD Hive tables sometimes I got an error:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hive/warehouse/test.db/rawlogs/dt=2016-01-31/.hive-staging_hive_2016-06-29_12-54-09_949_6553181118480369018-1/_task_tmp.-ext-10002/_tmp.001029_3 could only be replicated to 0 nodes instead of minReplication (=1). There are 71 datanode(s) running and no node(s) are excluded in this operation.
There are a lot of free space on both host FS and HDFS.
Configured Capacity | Capacity Used | Capacity Remaining | Block Pool Used
ARCHIVE 341.65 TB 56.64 TB (16.58%) 267.65 TB (78.34%) 56.64 TB
DISK 418.92 TB 247.78 TB (59.15%) 148.45 TB (35.44%) 247.78 TB
I have 4 racks defined for COLD servers.
Rack: /50907 1 node
Rack: /50912 1 node
Rack: /50917 1 node
Rack: /80104 9 nodes
It's a working cluster and I can't just cleanup all data as suggested in similar issue on stackoverflow.
Update. I decided to deploy renewed topology script across all servers in cluster. After deploying I did restart for all hadoop daemons on every node including namenode, but dfsadmin -showTopology shows the old scheme. What I need to do for renewing cluster topology? Maybe drop some kind of cache etc.