1

I am using hadoop version Hadoop 2.7.0-mapr-1506 . When data volume is at 100%, our jobs still tried to insert overwrite data to few hive tables and they are corrupted and gives the below exception when accessed,

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)  
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)  
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)  
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: maprfs:/hive/bigdata.db/cadfp_wmt_table  
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:289)  
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)  
at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)  

Now we have freed up space in the data volume and want to reclaim the data in the below tables, how can we achieve it

hadoop fs -ls /hive/bigdata.db/ | grep tmp~  
drwxr-xr-x   - bigdata bigdata         16 2019-04-05 07:38 /hive/bigdata.db/pc_bt_clean_table.tmp~!@  
drwxr-xr-x   - bigdata bigdata        209 2019-04-05 07:51 /hive/bigdata.db/pc_bt_table.tmp~!@  
drwxr-xr-x   - bigdata bigdata       1081 2019-04-05 07:38 /hive/bigdata.db/cadfp_wmt_table.tmp~!@ 

Tried steps mentioned here How to fix corrupt HDFS FIles but hdfs command does not work for me

howie
  • 2,587
  • 3
  • 27
  • 43
Albin
  • 371
  • 1
  • 4
  • 18

0 Answers0