Lets say I created a hive table with partition column as year, month and day and if i delete the partition from hdfs, then result get reflected in hive table or not
Asked
Active
Viewed 4,593 times
1
-
What do you mean by "delete the partition"? Delete all directories + files? delete only the sub-directories and put the files in the top directory? – David דודו Markovitz Mar 09 '17 at 07:23
-
Delete the all directories + files – Dipak Mar 09 '17 at 07:29
-
Possible duplicate of [How to update partition metadata in Hive , when partition data is manualy deleted from HDFS](http://stackoverflow.com/questions/21108251/how-to-update-partition-metadata-in-hive-when-partition-data-is-manualy-delete) – Remus Rusanu Mar 09 '17 at 07:32
2 Answers
1
Yes. The partition data will be gone.
The metastore will still hold the partition information (metadata) and you can see it using show partition mytable
.
You can find the partitions need to be dropped using msck repair mytable
.
You can drop the partitions using alter table mytable drop partition (...)

David דודו Markovitz
- 42,900
- 6
- 64
- 88
-
-
off-course it will. How can you query a table and see data that does not exist any more? – David דודו Markovitz Mar 09 '17 at 07:33
-
-
`msck repair` doesn't not reflect manually deleted folders in the metastore, it only adds partitions that are on HDFS but not in the metastore to the metastore. – cheseaux Jul 11 '17 at 13:38
-
@cheseaux - does not drop but does reflect. See my answer here https://stackoverflow.com/a/44931540/6336479 I have fixed this answer. Thanks – David דודו Markovitz Jul 11 '17 at 13:47
-
`This does not cause any issue` I also disagree with your point, see [this](https://stackoverflow.com/questions/37456932/aggregate-queries-fail-in-hive-if-partition-directory-doesnt-exist) for example. I also experimented a lot of issues with manually deleted partition files. See [my answer](https://stackoverflow.com/a/45036445/1852681) to a related question. – cheseaux Jul 11 '17 at 14:01
-
@cheseaux - you might got a point. Different engines, different versions, different bugs. – David דודו Markovitz Jul 11 '17 at 14:24
-
0
Hive table will still show the partitions, you will have to either drop the partitions deleted on HDFS manually (or drop and re-create table) and run MSCK.
Commands:
If you intend to alter the table and drop all deleted partitions-
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]
[IGNORE PROTECTION] [PURGE]; -- (Note: PURGE available in Hive 1.2.0 and later, IGNORE PROTECTION not available 2.0.0 and later)
I would go with drop and re-create table then run MSCK.
To add all existing partitions to table-
msck repair table <table_name>
Alternatively, you could drop all partitions using ALTER TABLE
and then run the MSCK
command.

Ani Menon
- 27,209
- 16
- 105
- 126