0

I am writing a python3 lambda function to get the total size of every folder and sub folder in an s3 bucket, using boto3. Here is how the files are stored:

http://s3/bucket 
    Folder1
        Folder1.1
            Item1.1.1
            Item1.1.2
         Folder1.2
             Item1.2.1
        ...
    Folder2
        Folder2.1
            Item2.1.1
        ...

I need to get the size of each folder and subfolder. From what I've been seeing while researching, it seems that the only way to do this is by getting the size of every file within each folder, and in each folder's sub folder... and add them up. This is very inefficient...especially because each subfolder has thousands upon THOUSANDS of files, each folder has 50+ subfolders, and there are 20+ folders.

How should I approach this task? Sorry if I used any incorrect terminology here. Please correct me if I said anything wrong. I am learning as I go, just got this task for work.

Thanks in advance; would greatly appreciate the help!

  • 1
    There is no direct way. You will need to navigate through each folder and subfolder to do that. Check https://stackoverflow.com/questions/32192391/how-do-i-find-the-total-size-of-my-aws-s3-storage-bucket-or-folder for more details – Arafat Nalkhande Apr 14 '20 at 05:27

1 Answers1

1

I suggest that you use Amazon S3 Inventory.

It can provide a daily CSV file containing a list of every object in the bucket.

Your program will need to parse the CSV file and perform your calculations, but this will be much faster than making API calls to Amazon S3.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470