2

We have encountered a situation where we need to figure out how to handle stale AWS Athena partitions due to Athena's service limits (20,000 partitions per table).

Say we want to have only one table and add a number of partitions there every day that reference a timestamp-like path to some logs located on S3 (example: /foo_bucket/logs/year=2019/month=03/day=11/hour=20). We found out that this approach lets us reach the partition limit in about 2 years. After that, we want to clean up old logs and partitions that are associated with them.

Questions:

  1. What will happen with partition metadata if an associated S3 path gets removed? This S3 bucket automatically removes old objects. Will the partition automatically be deleted as well or will it reference a non-existent S3 path?
  2. What will happen when we reach partition limit per table before S3 automatically deletes old objects? Will old partition metadata get deleted by Athena? I know that deleting a partition does not touch S3 object data (link).

Thanks!

rombez
  • 2,047
  • 2
  • 16
  • 16

1 Answers1

1

Question 1:

As the metadata and data are totally unlinked in AWS Athena, you will need to call the command that basically says "hey, refresh my table partitions list" wich is MSCK REPAIR TABLE https://docs.aws.amazon.com/athena/latest/ug/msck-repair-table.html

Question 2:

I think you will get and error when Athena tries to add a new partition in the metastore as the old partition metadata is not removed.

Feedback:

Maybe it's not a bad idea to consolidate old content into less granular partitions. For example, you can keep a day granularity for the last 3 - 4 months. Anything older than that can be stored per month. That will allow you to keep much more content in your table.

I hope it helps :)

Roberto
  • 757
  • 1
  • 12
  • 21
  • What will `MSCK REPAIR TABLE` return if it finds partitions without associated S3 paths? Will it remove the partitions, their metadata? Is the following the likely scenario in this case: 1. S3 cleans up old objects. 2. On next partition addition operation, after `MSCK REPAIR TABLE` query, Athena finds partitions with unlinked data and removes them. 3. Profit: no data on S3 and no partitions associated with that data. – rombez Aug 16 '19 at 09:25
  • 1
    You will see a message saying something like: "Partition yyyy-mm-dd missing" and it will be removed from the metastore. In your scenario if you are not reaching the partitions limit then you can work that way, old content will be removed and the MSCK REPAIR TABLE will refresh all the table metadata removing the old ones and adding the new ones. You can easily test this with just an empty table and play creating/deleting folders in S3. – Roberto Aug 16 '19 at 09:46
  • Would athena allow to correctly perform `MSCK REPAIR TABLE` if paths are inconsistent, e.g. for the past 3 months it is `/foo_bucket/year=2019/month=08/day=01/hour=00`for everything earlier then that it is just `/foo_bucket/year=2019/month=05/day=01`? – Ilya Kisil Aug 16 '19 at 12:32
  • No, a table is defined with a partition structure on creation time. You have to always respect that structure. If you have a granularity of hours and you want to consolidate one month in a single partition you need to put all month content into a single "hour" folder. For example, content from 2019-02-03 04:05:06 should me moved to partition /foo_bucket/year=2019/month=02/day=01/hour=00. But when you query old content, you really don't need to remember this things, just filter by year and month only. I feel I didn't explained myself well enough :D. If you don't understand it, let me know :) – Roberto Aug 16 '19 at 12:54
  • Ok, I see now what you mean and it is quite nifty workaround with a simple renaming file prefixes. But what if you have month=01/day=02/**hour=00**/file_1 and month=01/day=02/**hour=01**/file_1? Wouldn't one overwrite another after "putting" these files in "folder" ..month=01/day=01/? I guess, keeping full timestamp in the file name and not only in directory/prefix structure should be a rule thumb) – Ilya Kisil Aug 16 '19 at 15:09
  • I suggest not running `MSCK REPAIR TABLE` on a table with 20K partitions, it will take a very, very long time. You're probably better off listing the partitions using the Glue API and doing the pruning yourself if it's something that's going to happen often. – Theo Aug 18 '19 at 14:43
  • Since partitions are "metadata" for the table, I wonder if they should be removed (potentially with `MSCK REPAIR TABLE`) before removing the table itself? – YasiuMaster Oct 26 '20 at 12:47
  • 1
    If your goal is to remove the entire table, you don't need to worry about partitions. All that information will be removed once you drop the table at metastore level. – Roberto Oct 27 '20 at 13:06