1

I am running Hadoop 1.2.1 in pseudo distributed mode, having both the namenode and the datanode on the same virtual machine. The datanode has 4 volumes. I am doing some tests regarding the use of very small block sizes in Hadoop (4k, 8k, ...). The replication factor is set to 1.

When setting the block size at 8k I am able to copy a 64MB file to hdfs using:

bin/hadoop fs -put my64mbfile .

although while running the command I get the following exception several times:

13/08/29 10:50:47 WARN hdfs.DFSClient: NotReplicatedYetException sleeping 
/user/myuser/my64mbfile retries left 4
13/08/29 10:50:48 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: 
Not replicated yet:/user/myuser/my64mbfile
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock
 (FSNamesystem.java:1905)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock
 (NameNode.java:783)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke
 (DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
 java:1190)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)

 at org.apache.hadoop.ipc.Client.call(Client.java:1113)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
 at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke
 (DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod
 (RetryInvocationHandler.
 java:85)
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke
 (RetryInvocationHandler.java:62)
 at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock
 (DFSClient.java:3720)
 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream
 (DFSClient.java:3580)
 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600
 (DFSClient.java:2783)
 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run
 (DFSClient.java:3023)

After this, I reformat and restart hadoop.

When I call the same command, but with a 4k block size, I still get the warnings above and eventually I get this error (after only around one half of the file was copied to hdfs):

13/08/29 11:32:38 WARN hdfs.DFSClient: Error Recovery for 
blk_1692157315263473676_1009 bad datanode[0] nodes == null
13/08/29 11:32:38 WARN hdfs.DFSClient: Could not get block locations. Source 
file "/user/myuser/my64mbfile" - Aborting...
put: java.io.IOException: java.lang.OutOfMemoryError: Requested array size 
exceeds VM limit
13/08/29 11:32:38 ERROR hdfs.DFSClient: Failed to close file 
/user/myuser/my64mbfile
org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
 at java.io.DataOutputStream.writeLong(DataOutputStream.java:224)
 at org.apache.hadoop.hdfs.protocol.Block.write(Block.java:134)
 at org.apache.hadoop.io.ArrayWritable.write(ArrayWritable.java:98)
 at org.apache.hadoop.hdfs.server.namenode.FSEditLog$EditLogFileOutputStream.
 write(FSEditLog.java:184)
 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logEdit(FSEditLog.java:
 1138)
 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logOpenFile(FSEditLog.
 java:1299)
 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.persistBlocks
 (FSDirectory.java:305)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock
 (FSNamesystem.java:1947)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke
 (DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
 java:1190)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)

 at org.apache.hadoop.ipc.Client.call(Client.java:1113)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
 at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
 Impl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod
 (RetryInvocationHandler.java:85)
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke
 (RetryInvocationHandler.java:62)
 at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock
 (DFSClient.java:3720)
 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream
 (DFSClient.java:3580)
 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600
 (DFSClient.java:2783)
 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run
 (DFSClient.java:3023)

Is there a limit about the amount of blocks a datanode can handle? Or is there another limit for the namenode regarding the amount of objects (something other than dfs.namenode.fs-limits.max-blocks-per-file) ?

For the 4k case, the total amount of blocks should be 16384 blocks, which means ~4100 blocks per volume which means ~8200 files per volume (including metadata files).

ovimt
  • 13
  • 1
  • 5

1 Answers1

1

NameNode keeps the whole file to blocks mapping (BlockMap) in memory, so if the blocks you configured is too small, the BlockMap can grow quite large thus causing the OOME. See The memory consumption of hadoop's namenode?

Community
  • 1
  • 1
Jing Wang
  • 50
  • 3