3

I want to repair the hive table for any newly added/deleted partitions.Instead of manually running msck repair command in hive,is there any way to achieve this in java?I am trying to get all partitions from hdfs and from hive metastore and then after comparing them will put newly added/deleted partitions in hive metastore.But i am not able to get the api from hivecontext.I have tried to get all the partitions using hivecontext,but it is throwing error table not found.

System.out.println(hiveContext.metadataHive().getTable("anshu","mytable").getAllPartitions());

Is there any way to add/remove partitions in hive using java?

mahan07
  • 887
  • 4
  • 14
  • 32
  • AFAIK you *must* open a direct connection to the Metastore service; Spark does not expose its own > look into https://hive.apache.org/javadocs/r2.1.1/api/ under class `HiveMetastoreClient` methods `listPartitionNames(...)` and `getPartition(...)`, then class `Partition` method `gestSd()`, then class `StorageDescriptor` method `getLocation()` – Samson Scharfrichter Jan 14 '17 at 20:34

1 Answers1

2

Spark Option :

using hivecontext you can execute this like below example. no need to do it manually

sqlContext = HiveContext(sc)
sqlContext.sql("MSCK REPAIR TABLE your table")

Is there any way to add/remove partitions in hive using java?

Plain java option :

If you want to do it in plain java way with out using spark, with plain java code then You can use class HiveMetaStoreClient to query directly from HiveMetaStore.

enter image description here enter image description here

Please see my answer here with example usage

Community
  • 1
  • 1
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • thanks,but hive metastore listpartitions method only lists short.maxvalue(32767) partitions.If i have 1 lakh partitions then how to achieve that.Also,which approach is better --using sqlcontext.sql or listing all partitions in hive using hivemetastore and comparing it with the all the partitions in hdfs? – mahan07 Jan 15 '17 at 11:56
  • First thing is you have to look much closely in the issue(1 lakh partitions) you mentioned to be honest I don't know. Second thing if you are using spark hivecontext.sql is better approach instead of writing code with `HiveMetaStoreClient` if you dont want to use spark there then you have to go with `HiveMetaStoreClient` – Ram Ghadiyaram Jan 15 '17 at 13:03