We have encountered a situation where we need to figure out how to handle stale AWS Athena partitions due to Athena's service limits (20,000 partitions per table).
Say we want to have only one table and add a number of partitions there every day that reference a timestamp-like path to some logs located on S3 (example: /foo_bucket/logs/year=2019/month=03/day=11/hour=20
).
We found out that this approach lets us reach the partition limit in about 2 years. After that, we want to clean up old logs and partitions that are associated with them.
Questions:
- What will happen with partition metadata if an associated S3 path gets removed? This S3 bucket automatically removes old objects. Will the partition automatically be deleted as well or will it reference a non-existent S3 path?
- What will happen when we reach partition limit per table before S3 automatically deletes old objects? Will old partition metadata get deleted by Athena? I know that deleting a partition does not touch S3 object data (link).
Thanks!