This question How can I read the metadata for every item in an S3 bucket? deals with the way how to get metadata for an object on AWS S3, however the process is painfully slow. Even if I use "boto3" I get to roughly 50 objects metadata in 7 seconds. Is there any reliable alternative? I only need data for one directory which can however contain few thousands objects. As last resort I was thinking about getting only newest object`s metadata but for that I would need metadata first I guess :)
Asked
Active
Viewed 1,304 times
1
-
You can either run python multithreaded or use nodejs that will speed up the process... – Prabhat Jun 18 '18 at 03:26
1 Answers
2
If you don't mind only getting the information once per day, you can use Amazon S3 Inventory:
Amazon S3 inventory provides comma-separated values (CSV) or Apache optimized row columnar (ORC) output files that list your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (that is, objects that have names that begin with a common string).
If you need the information updated more often, you could configure an Event on the bucket that triggers an AWS Lambda function when a new object is created. The Lambda function could then store the information in a database for future reference. Of course, you'd have to write this code yourself.

John Rotenstein
- 241,921
- 22
- 380
- 470
-
Very interesting! Did not know about this. However I need it more frequently. Looks like most viable option would be threading. Thanks anyway... – Rezney Jun 18 '18 at 00:04
-
You could also configure an Event on the bucket that triggers an AWS Lambda function when a new object is created. The Lambda function could then store the information in a database for future reference. – John Rotenstein Jun 18 '18 at 00:09
-
That looks like really good idea. I will study something about it and give it a try. Just wondering am I right in assuming I will be again calling same code (boto3) but from lambda? I am sure it will be faster in regards of connection but will it help speaking about requests? Or I can do some interesting magic between lambda and S3? – Rezney Jun 18 '18 at 17:00
-
I'm sorry, I don't understand your question. Yes, you can run Python/boto3 code in Lambda that can call AWS services. It can be triggered whenever a new object is created in the S3 bucket and it will be given the name & bucket of the incoming object. – John Rotenstein Jun 18 '18 at 21:36
-
No worries, I already seen some nice examples using Futures. Please add also the "lambda" way to your answer and I will accept it. Thanks for your time... – Rezney Jun 18 '18 at 22:12