Missing block in Hadoop

Question

I am trying to run a wordcount job in Hadoop. Due to a previous error, I had to turn off the safe mode for the NameNode. Now, however, when trying to run the job, I am getting the following error:

14/08/06 14:49:08 INFO mapreduce.Job:  map 1% reduce 0%
14/08/06 14:49:25 INFO mapreduce.Job: Task Id : attempt_1407336345567_0002_m_000158_0, Status : FAILED
Error: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-971868671-192.168.50.2-1406571670535:blk_1073743276_2475 file=/wikidumps/enwiki-20130904-pages-meta-history3.xml-p000032706p000037161
    at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:838)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:526)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793)
    at java.io.DataInputStream.read(DataInputStream.java:100)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:164)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

The log files are not showing any particular error. Does anyone know the reason for this error? Thanks in advance!

1. Check if the file is actually on the HDFS filesystem. 2. Run [fsck](http://hadoop.apache.org/docs/r0.19.0/commands_manual.html#fsck). 3. Exactly describe what "previous error" you had and why you turned off safe mode manually (as this can lead to filesystem corruption). Without more information you are unlikely to get a useful response here. — jmiserez, Aug 06 '14 at 15:03
This was the error I was getting: http://stackoverflow.com/questions/4966592/hadoop-safemode-recovery-taking-too-long — user3033194, Aug 06 '14 at 15:08
It's a bit hard to explain, but I am pointing my "dfs.datanode.data.dir" to an external volume which is attached to this instance. The aim is to determine Hadoop performance on the Lustre filesystem, and this volume is the closest I can get to a shared filesystem. — user3033194, Aug 06 '14 at 15:11
After doing so, first I was getting the error pointed out in that link, and now this. — user3033194, Aug 06 '14 at 15:12
Can you access the file through the HDFS web interface? From the looks of it, Hadoop looks for it on the node 192.168.50.2, so you should be able to see it there. And I'm not sure what you mean by external volume: IIRC, every DataNode needs it's own local folder to store it's data, I don't think you can have them all point to the same external volume/mount point. — jmiserez, Aug 06 '14 at 15:18
By external, I mean that the volume retains the data even after the instance has been terminated. It is not a part of the node's local disk. It has to be attached to an instance (it is in a cloud service) when it has to be used. — user3033194, Aug 06 '14 at 15:21
There are 3 external volumes, attached to 3 datanodes separately — user3033194, Aug 06 '14 at 15:26

Missing block in Hadoop

0 Answers0