-1

I am trying to get contents of files in s3 and for that at first I am getting the list of the files from different folders/subfolders for which I will get the contents. However, I have realized that my method does not give me all the files in that bucket and it only reads less than the half of the files in the folders/subfolders and I am not sure what I am doing wrong. Here is my code:

 def get_s3_list(bucket, prefix):
    s3 = boto3.client("s3")
    objects = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
    

I think the part where I get s3.list_objects_v2 needs to be modified but I am not familiar with it. Thanks in advance.

Yag_r
  • 57
  • 6
  • Documentation for [list_objects_v2](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2) clearly states "Returns some or all (up to 1,000) of the objects in a bucket with each request." – jarmod Jul 25 '22 at 15:23

1 Answers1

1

You have to extend your code and add pagination. Only using pagination you can get full list of your bucket.

Marcin
  • 215,873
  • 14
  • 235
  • 294
  • 1
    Yes. The `list_objects_v2()` will only return a maximum of 1000 objects. Alternatively, instead of using the `client` version of calls, you could use the `resource` method of calls and use `bucket.objects.all()` -- I think that it returns _all_ objects and handles pagination for you. It's a more Pythonic way of accessing S3. For some examples, see [List all objects in AWS S3 bucket with their storage class using Boto3 Python](https://stackoverflow.com/a/66072127/174777) and [Listing contents of a bucket with boto3](https://stackoverflow.com/q/30249069/174777). – John Rotenstein Jul 15 '22 at 06:34