1

I am looking to find the size of each object in my S3 AWS account. Alternatively, list out objects that are more than 2 GB in Size.

I have tried listing out by bucket and I am able to get the total size:

s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket-name')
size = 0

for o in bucket.objects.all():    
        size += o.size    
print ('s3 size = %.3f GB' % (size/1024/1024/1024))

I am trying to find the output as similar to the AWS CLI command which gives the object name and size.

I know S3 lists up to to 1K object (paginated) based on the request and I would have to parse it. Also, if the bucket is huge (high millions to billions) listing is going to be really rough.

Would really appreciate any inputs here.

Thanks

Ron
  • 11
  • 2
  • Can you save yourself a trouble of doing this in python and use [S3 inventory](https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-inventory.html) instead to get the size of all your objects? – Marcin Oct 17 '20 at 03:46
  • Full code and IAM role can be found here : https://stackoverflow.com/a/58220730/9931092 – Amit Baranes Oct 17 '20 at 10:16
  • Yes we have been considering using S3 inventory too. – Ron Oct 19 '20 at 21:03
  • Thanks Amit for the code link. Will look into it and respond for further questions. – Ron Oct 19 '20 at 21:04

2 Answers2

0

Print all objects and their size:

for o in bucket.objects.all():    
  print(o.key, o.size)   

To only print objects larger than 2GB:

for o in bucket.objects.all():  
  if o.size > 2 * 1024 * 1024 * 1024:  
    print(o.key, o.size)   

However, if you have millions of objects, I would recommend Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects (including their size).

Jonny5
  • 1,390
  • 1
  • 15
  • 41
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • I think I tried that before and got this error: AttributeError: 's3.ObjectSummary' object has no attribute 'content_length' – Ron Oct 19 '20 at 21:09
  • Oops! Should be `size`. Fixed! – John Rotenstein Oct 19 '20 at 23:31
  • I haven't seen `AllAccessDisabled` before. Is it happening on just one object, or is it a whole bucket? I wonder if there might be a Bucket Policy that is denying access? – John Rotenstein Oct 22 '20 at 21:22
  • Hi John, That was an error on my side. fixed it with a for loop. the bucket name i was giving was incorrect. IAM policy is fine. – Ron Oct 22 '20 at 22:17
  • Thank you for your inputs! Really appreciate it. – Ron Oct 22 '20 at 22:17
0

Add something: The key.size is a int object, can you cannot assing it back with /1024 as the type will be float. We can do something like this:

s3 = boto3.resource('s3')

bucket = s3.Bucket('bucket-name')
size = 0
for o in bucket.objects.all():    
    if o.size >= 1024 and o.size < 1024**2 : # obj.size is int
            o_size_pretty = obj.size/1024
            unit = 'KB'
        elif o.size >= 1024**2 and obj.size < 1024**3 :
            o_size_pretty = obj.size/(1024**2)
            unit = 'MB'
        elif o.size >= 1024**3:
            o_size_pretty = obj.size/(1024**3)
            unit = 'GB'
        else:
            o_size_pretty = obj.size
            unit = 'Bytes'
    print(f'{o.key} {o_size_pretty} {unit}')
Johnny.X
  • 61
  • 3