8

I am getting below error:

Diagnostics: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-467931813-10.3.20.155-1514489559979:blk_1073741991_1167 file=/user/oozie/share/lib/lib_20171228193421/oozie/hadoop-auth-2.7.2-amzn-2.jar
Failing this attempt. Failing the application.

Although I have set replication factor 3 for /user/oozie/share/lib/ directory. All the jars under this path are available on 3 datanode but few jars are missing. Can any body suggest why this is happening and how to prevent this.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Pooja Soni
  • 137
  • 1
  • 2
  • 9
  • Have you tried running some `hadoop fsck` commands? – OneCricketeer Jan 05 '18 at 05:36
  • Yes, I tried hadoop fsck /user/oozie/share/lib/lib_20171228193421/ -files -blocks -racks command and the response is: /user/oozie/share/lib/lib_20171228193421/oozie/hadoop-auth-2.7.2-amzn-2.jar: CORRUPT blockpool BP-467931813-10.3.20.155-1514489559979 block blk_1073741991 MISSING 1 blocks of total size 70594 B 0. BP-467931813-10.3.20.155-1514489559979:blk_1073741991_1167 len=70594 MISSING! – Pooja Soni Jan 05 '18 at 06:07
  • It happens because one of your datanodes has gone bad. Maybe a disk is failing, for example. See this? https://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-filess – OneCricketeer Jan 05 '18 at 06:25
  • I agree that all 3 datanodes where block replicated, must have gone bad. But now is there any way of getting that block back. How to prevent this scenario. – Pooja Soni Jan 05 '18 at 06:34
  • In my case we need to resize cluster many times so new DNs gets added and removed oftenly. Then how can I avoid this exception. – Pooja Soni Jan 05 '18 at 06:36
  • Did you rebalance HDFS before removing too many DNs? How did you ensure you didn't remove all nodes containing the 3 replicas for a block? – OneCricketeer Jan 05 '18 at 06:39
  • I assume you are working in some scaling group like in AWS, in which case, you can store data in S3 instead. Then your filesystem isn't shrinking "oftenly" – OneCricketeer Jan 05 '18 at 06:41
  • We are getting missing block exception for the libraries which are mainly used by oozie. The hdfs path is like /user/oozie/share/lib/. Is there any support exists on Oozie which can read libraries sharelib from s3 instead of hdfs? – Pooja Soni Jan 05 '18 at 06:56
  • It sure can. S3 is a compatible HDFS filesystem. http://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html – OneCricketeer Jan 05 '18 at 06:58
  • Is there any way by which I can store library data on NN instead of DN? – Pooja Soni Jan 05 '18 at 07:02
  • No? Unless your NN is a DN. Namenode doesn't store blocks. It store block locations and other metadata – OneCricketeer Jan 05 '18 at 07:05
  • While this may have nothing to do with your problem, I'd like to inform that after `EMR` cluster resize (downscale), I was getting this error when trying to query tables stored on `HDFS` via `beeline` shell. My fault was that I was connecting `beeline` with local `Hive` *metastore* at **127.0.0.1**. Replacing IP with **localhost** in `beeline` connection statement resolved the issue. – y2k-shubham Jun 19 '18 at 09:24

3 Answers3

1

I was getting the same exception while trying to read a file from hdfs. The solution under the section "Clients use Hostnames when connecting to DataNodes" from this link worked for me: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html#Clients_use_Hostnames_when_connecting_to_DataNodes

I added this XML block to "hdfs-site.xml" and restarted the datanode and namenode servers:

<property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>true</value>
    <description>Whether clients should use datanode hostnames when
      connecting to datanodes.
    </description>
</property>
raahat
  • 11
  • 2
0

please check the file's owner in hdfs directory, I met this issue because the owner is "root", it got solved when I changed it to "your_user".

0

Got the same error when using Trino to connect to hive, I tried to connect HDFS from a Trino worker and found that port 9866 is not open on HDFS, opened the port on HDFS datenode and solved the problem. Related document: https://www.ibm.com/docs/en/spectrum-scale-bda?topic=requirements-firewall-recommendations-hdfs-transparency https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

Chujun Song
  • 138
  • 2
  • 9