1

Lets say I created a hive table with partition column as year, month and day and if i delete the partition from hdfs, then result get reflected in hive table or not

Ani Menon
  • 27,209
  • 16
  • 105
  • 126
Dipak
  • 39
  • 2
  • 9
  • What do you mean by "delete the partition"? Delete all directories + files? delete only the sub-directories and put the files in the top directory? – David דודו Markovitz Mar 09 '17 at 07:23
  • Delete the all directories + files – Dipak Mar 09 '17 at 07:29
  • Possible duplicate of [How to update partition metadata in Hive , when partition data is manualy deleted from HDFS](http://stackoverflow.com/questions/21108251/how-to-update-partition-metadata-in-hive-when-partition-data-is-manualy-delete) – Remus Rusanu Mar 09 '17 at 07:32

2 Answers2

1

Yes. The partition data will be gone.
The metastore will still hold the partition information (metadata) and you can see it using show partition mytable.
You can find the partitions need to be dropped using msck repair mytable.
You can drop the partitions using alter table mytable drop partition (...)

David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
  • Yes, but that result will get reflected in hive table or not? – Dipak Mar 09 '17 at 07:32
  • off-course it will. How can you query a table and see data that does not exist any more? – David דודו Markovitz Mar 09 '17 at 07:33
  • Thanks!! It helps me to understand the partition deletion – Dipak Mar 09 '17 at 09:37
  • `msck repair` doesn't not reflect manually deleted folders in the metastore, it only adds partitions that are on HDFS but not in the metastore to the metastore. – cheseaux Jul 11 '17 at 13:38
  • @cheseaux - does not drop but does reflect. See my answer here https://stackoverflow.com/a/44931540/6336479 I have fixed this answer. Thanks – David דודו Markovitz Jul 11 '17 at 13:47
  • `This does not cause any issue` I also disagree with your point, see [this](https://stackoverflow.com/questions/37456932/aggregate-queries-fail-in-hive-if-partition-directory-doesnt-exist) for example. I also experimented a lot of issues with manually deleted partition files. See [my answer](https://stackoverflow.com/a/45036445/1852681) to a related question. – cheseaux Jul 11 '17 at 14:01
  • @cheseaux - you might got a point. Different engines, different versions, different bugs. – David דודו Markovitz Jul 11 '17 at 14:24
  • Superb !! By using msck repair mytable i can able to resolve my problem – Dipak Nov 15 '18 at 12:23
0

Hive table will still show the partitions, you will have to either drop the partitions deleted on HDFS manually (or drop and re-create table) and run MSCK.

Commands:

If you intend to alter the table and drop all deleted partitions-

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]
  [IGNORE PROTECTION] [PURGE];            -- (Note: PURGE available in Hive 1.2.0 and later, IGNORE PROTECTION not available 2.0.0 and later)

I would go with drop and re-create table then run MSCK.

To add all existing partitions to table-

msck repair table <table_name>

Alternatively, you could drop all partitions using ALTER TABLE and then run the MSCK command.

Ani Menon
  • 27,209
  • 16
  • 105
  • 126