Why do I get "File could only be replicated to 0 nodes" when writing to a partitioned table?

Question

I create an external table in Hive with partitions and then try to populate it from the existing table, however, I get the following exceptions:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hive/warehouse/pavel.db/browserdatapart/.hive-staging_hive_2018-12-28_13-22-45_751_6056004898772238481-1/_task_tmp.-ext-10000/cityid=1/_tmp.000001_3 could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1719)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3372)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3296)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:850)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:504)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:814)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
    at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
    ... 18 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hive/warehouse/pavel.db/browserdatapart/.hive-staging_hive_2018-12-28_13-22-45_751_6056004898772238481-1/_task_tmp.-ext-10000/cityid=1/_tmp.000001_3 could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1719)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3372)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3296)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:850)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:504)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
    at org.apache.hadoop.ipc.Client.call(Client.java:1498)
    at org.apache.hadoop.ipc.Client.call(Client.java:1398)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
    at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:459)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:290)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:202)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:184)
    at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1580)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1375)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:

According to the internet these exceptions occur when datanode can't communicate with namenode or when you are running low on memory, but in my case everything is fine. I have already tried formatting my namenode and datanode as well. What else could be the issue?

https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo I've also read this. And it didn't help me.

I am running on tez. this works:

insert into table browserdatapart partition(cityid) select UserAgent,cityid from browserdata limit 100;

And this fails with the exception I provided:

insert into table browserdatapart partition(cityid) select UserAgent,cityid from browserdata;

Just tried it on a smaller dataset and it looks like it works, so I guess it is a memory issue after all, but I don't understand how, because I have 37 gigs available. And the dataset is 24 gigs. — pavel_orekhov, Dec 28 '18 at 14:00
Please provide more details. You are running MR, not tez, right? and if fails on mapper, right? What is the size of source table and how many mappers started? Also look at the failed mapper log, there can be something interesting — leftjoin, Dec 28 '18 at 14:31
@leftjoin I am running on tez. this works: `insert into table browserdatapart partition(cityid) select UserAgent,cityid from browserdata limit 100;`, this fails with the exception I provided: `insert into table browserdatapart partition(cityid) select UserAgent,cityid from browserdata;` (note the limit in the end). So, if I try to insert 100 rows it's ok, but if I try to insert the entire dataset it fails. — pavel_orekhov, Dec 28 '18 at 14:33
@leftjoin, also, my dataset is 24 gigs, I have more than 30 gigs of free memory, and I have 21 fields in my dataset, but load only 2 (useragent and cityid), which probably amounts to 24/10 = 2.4 of memory being taken up after the query completes, I don't think I should run out of memory. — pavel_orekhov, Dec 28 '18 at 14:40
@leftjoin https://paste.fedoraproject.org/paste/SwWI~gHY34Ccw9Q4ksajVA this is my console output. — pavel_orekhov, Dec 28 '18 at 14:45
@leftjoin OOM errors are thrown when we don't have enough RAM, while I am talking about hdfs memory. — pavel_orekhov, Dec 28 '18 at 15:10
Similar question : https://stackoverflow.com/q/36015864/2700344 — leftjoin, Dec 28 '18 at 15:16
Thanks, I have seen this, and increased all the resources, but this still happens. Weird... — pavel_orekhov, Dec 28 '18 at 15:18

score 1 · Accepted Answer · answered Dec 29 '18 at 12:22

1

SET hive.exec.max.dynamic.partitions=100000; 
SET hive.exec.max.dynamic.partitions.pernode=100000;

Setting the above parameters solved it for me. I guess hive was not able to replicate data to those partitions that show up in the exception, because there were more than the maximum (which is 224 in the case of my dataset).

answered Dec 29 '18 at 12:22

pavel_orekhov

1,657
2
15
37

Congratulations! Usually the exceptions is something like "too many dynamic partitions" in this case. – leftjoin Dec 29 '18 at 12:36
@leftjoin Thanks! I don't know, these errors are very misleading. – pavel_orekhov Dec 29 '18 at 12:45
@leftjoin, ok, it seems like it only worked one time. I consequently removed the table and tried doing it again. In fact now even `insert into table browserdatapart partition(cityid) select useragent,cityid from browserdata limit 1;` fails. It can't insert even 1 row, what the heck? – pavel_orekhov Dec 29 '18 at 12:57
@leftjoin now it worked again! I don't understand what's happening. – pavel_orekhov Dec 29 '18 at 13:00
Maybe hortonworks did something completely bad in their distro. I keep having these weird problems. – pavel_orekhov Dec 29 '18 at 13:02
@leftjoin here's the real answer https://stackoverflow.com/questions/54561086/how-do-i-fix-file-could-only-be-replicated-to-0-nodes-instead-of-minreplication/54719797#54719797 – pavel_orekhov Feb 16 '19 at 04:06

Why do I get "File could only be replicated to 0 nodes" when writing to a partitioned table?

1 Answers1

Linked