Let's say I have a hive table partitioned by date with its data stored in S3 as Parquet files. Let's also assume that for a particular partition (date), there were originally 20
records.
If I then delete the original files and put new Parquet files with 50
records in the same folder, do I need to drop and recreate that partition for the new data to reflect?
My understanding was that we don't have to recreate partitions. So I tried removing old data from the respective folder and keeping the new data without "updating" the Hive partition. However, then when I took count(*)
for that date, it still showed as 20
records instead of 50
. Upon dropping and creating the partition again, it started showing the correct count. Is that the expected behavior?