2

I've a file on hdfs in the path 'test/test.txt' which is 1.3G

output of ls and du commands is:

hadoop fs -du test/test.txt -> 1379081672 test/test.txt

hadoop fs -ls test/test.txt ->

Found 1 items
-rw-r--r--   3 testuser supergroup 1379081672 2014-05-06 20:27 test/test.txt

I want to run a mapreduce job on this file but when i start the mapreduce job on this file the job fails with the following error:

hadoop jar myjar.jar test.TestMapReduceDriver test output

14/05/29 16:42:03 WARN mapred.JobClient: Use GenericOptionsParser for parsing the 
arguments. Applications should implement Tool for the same.
14/05/29 16:42:03 INFO input.FileInputFormat: Total input paths to process : 1
14/05/29 16:42:03 INFO mapred.JobClient: Running job: job_201405271131_9661
14/05/29 16:42:04 INFO mapred.JobClient:  map 0% reduce 0%
14/05/29 16:42:17 INFO mapred.JobClient: Task Id : attempt_201405271131_9661_m_000004_0, Status : FAILED
java.io.IOException: Cannot obtain block length for LocatedBlock{BP-428948818-namenode-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode4:50010, datanode3:50010, datanode1:50010]}
at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:319)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:263)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:205)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:198)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1117)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:249)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:746)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:83)
at org.apache.hadoop.mapred.Ma`

i tried the following commands:

hadoop fs -cat test/test.txt gives the following error

cat: Cannot obtain block length for LocatedBlock{BP-428948818-10.17.56.16-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode3:50010, datanode1:50010, datanode4:50010]}

additionally i can't copy the file hadoop fs -cp test/test.txt tmp gives same error:

cp: Cannot obtain block length for LocatedBlock{BP-428948818-10.17.56.16-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode1:50010, datanode3:50010, datanode4:50010]}

output of the hdfs fsck /user/testuser/test/test.txt command:

Connecting to namenode via `http://namenode:50070`
FSCK started by testuser (auth:SIMPLE) from /10.17.56.16 for path 
/user/testuser/test/test.txt at Thu May 29 17:00:44 EEST 2014
Status: HEALTHY
Total size: 0 B (Total open files size: 1379081672 B)
Total dirs: 0
Total files:    0 (Files currently being written: 1)
Total blocks (validated):   0 (Total open file blocks (not validated): 21)
Minimally replicated blocks:    0
Over-replicated blocks: 0
Under-replicated blocks:    0
Mis-replicated blocks:      0
Default replication factor: 3
Average block replication:  0.0
Corrupt blocks:     0
Missing replicas:       0
Number of data-nodes:       5
Number of racks:        1
FSCK ended at Thu May 29 17:00:44 EEST 2014 in 0 milliseconds
The filesystem under path /user/testuser/test/test.txt is HEALTHY

by the way i can see the content of the test.txt file from the web browser.

hadoop version is: Hadoop 2.0.0-cdh4.5.0

husnu
  • 303
  • 1
  • 4
  • 9
  • Possible duplicate of [java.io.IOException: Cannot obtain block length for LocatedBlock](http://stackoverflow.com/questions/27181371/java-io-ioexception-cannot-obtain-block-length-for-locatedblock) – Joe23 Apr 04 '17 at 10:49

4 Answers4

2

I got the same issue with you and I fixed it by the following steps. There are some files that opened by flume but never closed (I am not sure about your reason). You need to find the name of the opened files by the command:

hdfs fsck /directory/of/locked/files/ -files -openforwrite

You can try to recover files as command:

hdfs debug recoverLease -path <path-of-the-file> -retries 3 

Or removing them by the command:

hdfs dfs -rmr <path-of-the-file>
Rearchy
  • 76
  • 5
1

I had the same error, but it was not due to the full disk problem, and I think the inverse, where there were files and blocks referenced by in the namenode that did not exist on any datanodes.

Thus, hdfs dfs -ls shows the files, but any operation on them fails, e.g. hdfs dfs -copyToLocal.

In my case, the hard part was isolating which files were listed but corrupted, as they existed in a tree having thousands of files. Oddly, hdfs fsck /path/to/files/ did not report any problems.

My solution was:

  1. Isolate the location using copyToLocal which resulted in copyToLocal: Cannot obtain block length for LocatedBlock{BP-1918381527-10.74.2.77-1420822494740:blk_1120909039_47667041; getBlockSize()=1231; corrupt=false; offset=0; locs=[10.74.2.168:50010, 10.74.2.166:50010, 10.74.2.164:50010]} for several files
  2. Get a list of the local directories using ls -1 > baddirs.out
  3. get rid of the local files from the first copyToLocal
  4. use for files incat baddirs.out;do echo $files; hdfs dfs -copyToLocal $files This will produce a list of directories checks, and errors where files are found.
  5. get rid of the local files again, and now get lists of files from each affected subdirectory. Use that as input to a file-by-file copyToLocal, at which point you can echo each file as it's copied, then see where the error occurs.
  6. use hdfs dfs -rm <file> for each file.
  7. Confirm you got 'em all be removing all local files again, and using the original copyToLocal on the top level directory where you had problems.

A simple two hour process!

Tom Harrison
  • 13,533
  • 3
  • 49
  • 77
0

You are having some corrupted files with no blocks on datanode but an entry in namenode. Best to follow this:

https://stackoverflow.com/a/19216037/812906

Community
  • 1
  • 1
Dragonborn
  • 1,755
  • 1
  • 16
  • 37
0

According to this this may be produced by a full disk problem. I came across the same problem recently with an old file and checking my servers metrics it effectively was a full disk problem during the creation of that file. Most solutions just claim to delete the file and prey for it not happening again.

msemelman
  • 2,877
  • 1
  • 21
  • 19