hadoop BlockMissingException

Question

I am getting below error:

Diagnostics: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-467931813-10.3.20.155-1514489559979:blk_1073741991_1167 file=/user/oozie/share/lib/lib_20171228193421/oozie/hadoop-auth-2.7.2-amzn-2.jar
Failing this attempt. Failing the application.

Although I have set replication factor 3 for /user/oozie/share/lib/ directory. All the jars under this path are available on 3 datanode but few jars are missing. Can any body suggest why this is happening and how to prevent this.

Yes, I tried hadoop fsck /user/oozie/share/lib/lib_20171228193421/ -files -blocks -racks command and the response is: /user/oozie/share/lib/lib_20171228193421/oozie/hadoop-auth-2.7.2-amzn-2.jar: CORRUPT blockpool BP-467931813-10.3.20.155-1514489559979 block blk_1073741991 MISSING 1 blocks of total size 70594 B 0. BP-467931813-10.3.20.155-1514489559979:blk_1073741991_1167 len=70594 MISSING! — Pooja Soni, Jan 05 '18 at 06:07
It happens because one of your datanodes has gone bad. Maybe a disk is failing, for example. See this? https://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-filess — OneCricketeer, Jan 05 '18 at 06:25
I agree that all 3 datanodes where block replicated, must have gone bad. But now is there any way of getting that block back. How to prevent this scenario. — Pooja Soni, Jan 05 '18 at 06:34
In my case we need to resize cluster many times so new DNs gets added and removed oftenly. Then how can I avoid this exception. — Pooja Soni, Jan 05 '18 at 06:36
Did you rebalance HDFS before removing too many DNs? How did you ensure you didn't remove all nodes containing the 3 replicas for a block? — OneCricketeer, Jan 05 '18 at 06:39
I assume you are working in some scaling group like in AWS, in which case, you can store data in S3 instead. Then your filesystem isn't shrinking "oftenly" — OneCricketeer, Jan 05 '18 at 06:41
We are getting missing block exception for the libraries which are mainly used by oozie. The hdfs path is like /user/oozie/share/lib/. Is there any support exists on Oozie which can read libraries sharelib from s3 instead of hdfs? — Pooja Soni, Jan 05 '18 at 06:56
It sure can. S3 is a compatible HDFS filesystem. http://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html — OneCricketeer, Jan 05 '18 at 06:58
Is there any way by which I can store library data on NN instead of DN? — Pooja Soni, Jan 05 '18 at 07:02
No? Unless your NN is a DN. Namenode doesn't store blocks. It store block locations and other metadata — OneCricketeer, Jan 05 '18 at 07:05
While this may have nothing to do with your problem, I'd like to inform that after `EMR` cluster resize (downscale), I was getting this error when trying to query tables stored on `HDFS` via `beeline` shell. My fault was that I was connecting `beeline` with local `Hive` *metastore* at **127.0.0.1**. Replacing IP with **localhost** in `beeline` connection statement resolved the issue. — y2k-shubham, Jun 19 '18 at 09:24

score 1 · Answer 1 · answered Mar 21 '19 at 22:51

I was getting the same exception while trying to read a file from hdfs. The solution under the section "Clients use Hostnames when connecting to DataNodes" from this link worked for me: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html#Clients_use_Hostnames_when_connecting_to_DataNodes

I added this XML block to "hdfs-site.xml" and restarted the datanode and namenode servers:

<property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>true</value>
    <description>Whether clients should use datanode hostnames when
      connecting to datanodes.
    </description>
</property>

score 0 · Answer 2 · answered Jan 19 '20 at 16:31

0

please check the file's owner in hdfs directory, I met this issue because the owner is "root", it got solved when I changed it to "your_user".

answered Jan 19 '20 at 16:31

Tyler Zhang

1

score 0 · Answer 3 · answered May 29 '22 at 15:56

Got the same error when using Trino to connect to hive, I tried to connect HDFS from a Trino worker and found that port 9866 is not open on HDFS, opened the port on HDFS datenode and solved the problem. Related document: https://www.ibm.com/docs/en/spectrum-scale-bda?topic=requirements-firewall-recommendations-hdfs-transparency https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

hadoop BlockMissingException

3 Answers3

Linked