The results of Athena query is saved by the query id (a long string) in S3. I was wondering if there's a way to save the results of the query with a pre-specified name? (that can later be easily looked up)
Asked
Active
Viewed 7,085 times
4 Answers
6
You can do so by a simple AWS Lambda function.
Change names of AWS Athena results stored in S3 bucket
client = boto3.client('athena')
s3 = boto3.resource("s3")
#run query
queryStart = client.start_query_execution(
QueryString = '
#PUT_YOUR_QUERY_HERE
SELECT *
FROM "db_name"."table_name"
WHERE value > 50
',
QueryExecutionContext = {
'Database': "covid_data" //YOUR_ATHENA_DATABASE_NAME
},
ResultConfiguration = {
#query result output location you mentioned in AWS Athena
"OutputLocation": "s3://bucket-name-X/folder-Y/"
}
)
#executes query and waits 3 seconds
queryId = queryStart['QueryExecutionId']
time.sleep(3)
#copies newly generated csv file with appropriate name
#query result output location you mentioned in AWS Athena
queryLoc = "bucket-name-X/folder-Y/" + queryId + ".csv"
#destination location and file name
s3.Object("bucket-name-A", "report-2018.csv").copy_from(CopySource = queryLoc)
#deletes Athena generated csv and it's metadata file
response = s3.delete_object(
Bucket='bucket-name-A',
Key=queryId+".csv"
)
response = s3.delete_object(
Bucket='bucket-name-A',
Key=queryId+".csv.metadata"
)
print('{file-name} csv generated')

Pranav
- 151
- 1
- 4
3
unfortunately no (at least not yet)! the best way to do this as of now is to write a script to go through all the results of each run and rename (moving+deleting) all the files in that s3 bucket!
2
For named queries your results location will be structured as follows:
s3://athena-query-results-<account>-<region>/<query-name>/<year>/<month>/<day>/<UUID>.csv
I don't know any method of specifying the UUID by the client. But you could look for the newest file within the s3 folder of your named query.
Alternatively, you could use the s3 API or the aws cli to copy the result into a location of your choice.
Does this answer your question?

user152468
- 3,202
- 6
- 27
- 57
-
1I have many different files outputted to S3 and I will be processing each of these file differently (as an input to a spark process on EMR) so looking up by date/time won't help as I need to know the exact file to pass them to the corresponding spark function and copying the files to specific location (that can be later looked up by folder name) won't help either as I still need to know what exact file I'm copying :/ – Peybae Jul 19 '18 at 19:23
0
def delete_metadata():
s3 = boto3.resource('s3')
client_s3 = boto3.client('s3')
bucket = s3.Bucket('testing')
for obj in bucket.objects.filter(Prefix='prepared/'):
if obj.key.endswith('.metadata'):
print(obj.key)
client_s3.delete_object(Bucket=bucket.name,Key=obj.key)

Rohit Thapliyal
- 11
- 1