Hi I have a question related to hive. Can anyone help me? Lets say I add a partition to a hive table. Then if I continuously keep on adding files to that partition in hdfs/s3, when I run a query on this corresponding partition, will it be able to discover the new data added
Asked
Active
Viewed 30 times
1 Answers
0
When you add partitions manually in HDFS and not through Hive query/statements, Hive does not capture these directory details automatically and we need to make the HMS to be aware of the newly added HDFS directory by running MSCK REPAIR
.
By default, when you run, MSCK REPAIR TABLE <table_name>
Hive looks for newly added partitions for that table in HDFS and updates the HMS with the updated directory details. Once this is done, your query against the newly added partition's data will be captured (assuming the partition directory in HDFS has files with data in it).
There are also additional options available for MSCK REPAIR
from Hive version 3.0.
You can see below pages for more info:
Hope the above answer helps you!

Gomz
- 850
- 7
- 17