0

I have a task where on a scheduled basis need to check number of files in a bucket (files are uploaded via a NAS) and then e-mail the total number using SES.

The e-mail part on its own is working fine. However, since I have over 40 000 files in the bucket it takes over 5 mins or more to return the count of total number of files.

From an design perspective, is it better to put this part of the logic in an EC2 machine and then schedule the action on the ec2? Or are there better ways to do this?

Note, I don't have to list all the files. I simply want to get a total count of all the files in the bucket.

Souciance Eqdam Rashti
  • 3,143
  • 3
  • 15
  • 31
  • Does [this answer](https://stackoverflow.com/a/32908591/1143724) help you at all? You didn't list what you've tried, so no way of knowing whether or not you've used this method. – MrDuk Mar 30 '18 at 18:52

2 Answers2

1

How about having a lambda triggered every time a file is put/delete/etc

and according to the event received, lambda updates one DynamoDb table which is storing the numbers.

e.g.
In case, file is added to S3, lambda will increase the count in DynamoDb table by 1
and in case of file delete lambda will decrease the count

So this way, I guess, you will always have the latest count without even counting the files.

raevilman
  • 3,169
  • 2
  • 17
  • 29
0

You did not mention how often you need to do this file count.

If it is daily or less often, you can activate Amazon S3 Inventory. It can provide a daily dump of all files in a bucket, from which you could perform a count.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • I would say once a week. But I already have all the files in an s3 bucket which has a bit over 40 000 files. I can count the total but the process is very slow. Can take 10 min or so, hence was looking for an alternative solution. – Souciance Eqdam Rashti Mar 30 '18 at 10:39
  • In that case, Amazon S3 Inventory can do it for you, with information provided daily. – John Rotenstein Mar 30 '18 at 20:58