1

Is it possible to use AWS Athena to query S3 Object Tagging? For example, if I have an S3 layout such as this

bucketName/typeFoo/object1.txt
bucketName/typeFoo/object2.txt
bucketName/typeFoo/object3.txt

bucketName/typeBar/object1.txt
bucketName/typeBar/object2.txt
bucketName/typeBar/object3.txt

And each object has an S3 Object Tag such as this

#For typeFoo/object1.txt and typeBar/object1.txt
id=A

#For typeFoo/object2.txt and typeBar/object2.txt
id=B

#For typeFoo/object3.txt and typeBar/object3.txt
id=C

Then is it possible to run an AWS Athena query to get any object with the associated tag such as this

select * from myAthenaTable where tag.id = 'A'
# returns typeFoo/object1.txt and typeBar/object1.txt

This is just an example and doesn't reflect my actual S3 bucket/object-prefix layout. Feel free to use any layout you wish in your answers/comments.

Ultimately I have a plethora of objects that could be in different buckets and folder paths but they are related to each other and my goal is to tag them so that I can query for a particular id value and get all objects related to that id. The id value would be a GUID and that GUID would map to many different types of objects that are related e.g., I could have a video file, a picture file, a meta-data file, and a json file and I want to get all of those files using their common id value; please feel free to offer suggestions too because I have the ability to structure this as I see fit.

Update - Note S3 Object Metadata and S3 Object Tagging are two different things.

Kyle Bridenstine
  • 6,055
  • 11
  • 62
  • 100
  • 1
    Are you asking whether Amazon Athena can query Amazon S3 object metadate? It cannot. Amazon Athena can only process the content of files in Amazon S3. – John Rotenstein Jul 20 '19 at 17:04
  • S3 Object Metadata and S3 Object Tagging are two different things. https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html versus https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html – Kyle Bridenstine Jul 20 '19 at 17:07
  • Oops! My bad. Regardless, Amazon Athena can only query the _contents_ of files. It cannot query anything about the files themselves. This might give you a few ideas: [amazon s3 - Aws S3 Filter by Tags. Search by tags - Stack Overflow](https://stackoverflow.com/questions/41571309/aws-s3-filter-by-tags-search-by-tags) – John Rotenstein Jul 21 '19 at 00:29

1 Answers1

1

Athena does not support querying based on s3 tag

one workaround is, you can create a meta file which contains the tag and file mapping using lambda i.e whenever new file comes to s3 and lambda would update a file in s3 with tag and name details.

sivakumar
  • 66
  • 2