6

I have an athena table with partition based on date like this:

20190218

I want to delete all the partitions that are created last year.

I tried the below query, but it didnt work.

ALTER TABLE tblname DROP PARTITION (partition1 < '20181231');

ALTER TABLE tblname DROP PARTITION (partition1 > '20181010'), Partition (partition1 < '20181231');
Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82
sakthi srinivas
  • 182
  • 1
  • 4
  • 12

4 Answers4

7

According to https://docs.aws.amazon.com/athena/latest/ug/alter-table-drop-partition.html, ALTER TABLE tblname DROP PARTITION takes a partition spec, so no ranges are allowed.

In Presto you would do DELETE FROM tblname WHERE ..., but DELETE is not supported by Athena either.

For these reasons, you need to do leverage some external solution.

For example:

  1. list the files as in https://stackoverflow.com/a/48824373/65458
  2. delete the files and containing directories
  3. update partitions information (https://docs.aws.amazon.com/athena/latest/ug/msck-repair-table.html should be helpful)
Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82
  • 3
    Also important to note that `alter table drop partition` will not delete any data, just the partition meta-data object. Deleting the files is an independent operation and could be done in any order if required. – Davos Aug 12 '19 at 03:35
  • 1
    @Davos, I think this is true for external tables. – Piotr Findeisen Aug 16 '19 at 19:54
  • I used the aws cli to retrieve the partitions. Then I used a bash script to run aws cli commands to drop the partition if it was older than some date. – ja6a May 01 '20 at 14:06
3

While the Athena SQL may not support it at this time, the Glue API call GetPartitions (that Athena uses under the hood for queries) supports complex filter expressions similar to what you can write in a SQL WHERE expression.

Instead of deleting partitions through Athena you can do GetPartitions followed by BatchDeletePartition using the Glue API.

Theo
  • 131,503
  • 21
  • 160
  • 205
2

this is the script the does what Theo recommended.

import json
import logging

import awswrangler as wr
import boto3
from botocore.exceptions import ClientError

logging.basicConfig(level=logging.INFO, format=logging.BASIC_FORMAT)
logger = logging.getLogger()


def delete_partitions(database_name: str, table_name: str):
  client = boto3.client('glue')
  paginator = client.get_paginator('get_partitions')
  page_count = 0
  partition_count = 0
  for page in paginator.paginate(DatabaseName=database_name, TableName=table_name, MaxResults=20):
    page_count = page_count + 1
    partitions = page['Partitions']
    partitions_to_delete = []
    for partition in partitions:
      partition_count = partition_count + 1
      partitions_to_delete.append({'Values': partition['Values']})
      logger.info(f"Found partition {partition['Values']}")
    if partitions_to_delete:
      response = client.batch_delete_partition(DatabaseName=database_name, TableName=table_name,
        PartitionsToDelete=partitions_to_delete)
      logger.info(f'Deleted partitions with response: {response}')
    else:
      logger.info('Done with all partitions')


def repair_table(database_name: str, table_name: str):
  client = boto3.client('athena')
  try:
    response = client.start_query_execution(QueryString='MSCK REPAIR TABLE ' + table_name + ';',
      QueryExecutionContext={'Database': database_name}, )
  except ClientError as err:
    logger.info(err.response['Error']['Message'])
  else:
    res = wr.athena.wait_query(query_execution_id=response['QueryExecutionId'])
    logger.info(f"Query succeeded: {json.dumps(res, indent=2)}")


if __name__ == '__main__':
  table = 'table_name'
  database = 'database_name'
  delete_partitions(database_name=database, table_name=table)
  repair_table(database_name=database, table_name=table)
Dharman
  • 30,962
  • 25
  • 85
  • 135
Khoa
  • 175
  • 1
  • 8
0

Posting the Glue API workaround for Java to save some time for these who need it:

public void deleteMetadataTablePartition(String catalog,
                                         String db,
                                         String table,
                                         String expression) {

    GetPartitionsRequest getPartitionsRequest = new GetPartitionsRequest()
            .withCatalogId(catalog)
            .withDatabaseName(db)
            .withTableName(table)
            .withExpression(expression);

    List<PartitionValueList> partitionsToDelete = new ArrayList<>();

    do {
        GetPartitionsResult getPartitionsResult = this.glue.getPartitions(getPartitionsRequest);
        List<PartitionValueList> partitionsValues = getPartitionsResult.getPartitions()
                .parallelStream()
                .map(p -> new PartitionValueList().withValues(p.getValues()))
                .collect(Collectors.toList());

        partitionsToDelete.addAll(partitionsValues);

        getPartitionsRequest.setNextToken(getPartitionsResult.getNextToken());
    } while (getPartitionsRequest.getNextToken() != null);

    Lists.partition(partitionsToDelete, 25)
            .parallelStream()
            .forEach(partitionValueList -> {
                glue.batchDeletePartition(
                        new BatchDeletePartitionRequest()
                                .withCatalogId(catalog)
                                .withDatabaseName(db)
                                .withTableName(table)
                                .withPartitionsToDelete(partitionValueList));
            });
}
Alex
  • 8,827
  • 3
  • 42
  • 58