How to change the name of the Athena results stored in S3?

Question

The results of Athena query is saved by the query id (a long string) in S3. I was wondering if there's a way to save the results of the query with a pre-specified name? (that can later be easily looked up)

score 6 · Answer 1 · answered May 29 '20 at 17:32

You can do so by a simple AWS Lambda function.

Change names of AWS Athena results stored in S3 bucket

client = boto3.client('athena')
s3 = boto3.resource("s3")

#run query 
queryStart = client.start_query_execution(
    QueryString = '
      #PUT_YOUR_QUERY_HERE
      SELECT *
      FROM "db_name"."table_name"
      WHERE value > 50
    ',
    QueryExecutionContext = {
      'Database': "covid_data" //YOUR_ATHENA_DATABASE_NAME
    },
    ResultConfiguration = {
      #query result output location you mentioned in AWS Athena
      "OutputLocation": "s3://bucket-name-X/folder-Y/"
    }
  )

#executes query and waits 3 seconds
queryId = queryStart['QueryExecutionId']
time.sleep(3)

#copies newly generated csv file with appropriate name
#query result output location you mentioned in AWS Athena
queryLoc = "bucket-name-X/folder-Y/" + queryId + ".csv"

#destination location and file name
s3.Object("bucket-name-A", "report-2018.csv").copy_from(CopySource = queryLoc)

#deletes Athena generated csv and it's metadata file
response = s3.delete_object(
    Bucket='bucket-name-A',
    Key=queryId+".csv"
)
response = s3.delete_object(
    Bucket='bucket-name-A',
    Key=queryId+".csv.metadata"
)
print('{file-name} csv generated')

score 3 · Accepted Answer · answered Aug 30 '18 at 02:15

3

unfortunately no (at least not yet)! the best way to do this as of now is to write a script to go through all the results of each run and rename (moving+deleting) all the files in that s3 bucket!

answered Aug 30 '18 at 02:15

score 2 · Answer 3 · answered Jul 19 '18 at 12:14

2

For named queries your results location will be structured as follows:

s3://athena-query-results-<account>-<region>/<query-name>/<year>/<month>/<day>/<UUID>.csv

I don't know any method of specifying the UUID by the client. But you could look for the newest file within the s3 folder of your named query.

Alternatively, you could use the s3 API or the aws cli to copy the result into a location of your choice.

Does this answer your question?

answered Jul 19 '18 at 12:14

user152468

3,202
6
27
57

1

I have many different files outputted to S3 and I will be processing each of these file differently (as an input to a spark process on EMR) so looking up by date/time won't help as I need to know the exact file to pass them to the corresponding spark function and copying the files to specific location (that can be later looked up by folder name) won't help either as I still need to know what exact file I'm copying :/ – Peybae Jul 19 '18 at 19:23

score 0 · Answer 4 · answered Nov 17 '20 at 09:24

0

def delete_metadata():
    s3 = boto3.resource('s3')
    client_s3 = boto3.client('s3')
    bucket = s3.Bucket('testing')
    for obj in bucket.objects.filter(Prefix='prepared/'):
        if obj.key.endswith('.metadata'):
            print(obj.key)
            client_s3.delete_object(Bucket=bucket.name,Key=obj.key)

answered Nov 17 '20 at 09:24

Rohit Thapliyal

11
1

This deletes all metadata files in a s3 bucket subfolder – Rohit Thapliyal Nov 17 '20 at 09:25

How to change the name of the Athena results stored in S3?

4 Answers4

Linked