2

I am using the below code and referred to many SO answers for listing files under a folder using boto3 and python but was unable to do so. Below is my code:

s3 = boto3.client('s3')
        object_listing = s3.list_objects_v2(Bucket='maxValue',
                                    Prefix='madl-temp/')

My s3 path is "s3://madl-temp/maxValue/" where I want to find if there are any parquet files under the maxValue bucket based on which I have to do something like below:

If len(maxValue)>0:
 maxValue=true
else:
 maxValue=false

I am running it via Glue jobs and I am getting the below error:

botocore.errorfactory.NoSuchBucket: An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist
whatsinthename
  • 1,828
  • 20
  • 59
  • I want to ensure that it returns 0 if I try to print the size of 'contents' and if it contains the n objects then it should return n as the value but in your case it is returning 1 as the value even though there are no objects in that bucket – whatsinthename Oct 06 '21 at 07:30
  • Then do `len(object_listing['Contents']) - 1`. The empty folder `maxValue/` is considered as 1. So you will always have one more value (all files + one folder). – Marcin Oct 06 '21 at 07:34
  • Okay is it the right way of doing it? Any documentation reference? – whatsinthename Oct 06 '21 at 07:35
  • You can just `print(object_listing['Contents'])` and you will see that there will be the folder name included, even though there is no files. – Marcin Oct 06 '21 at 07:36
  • 1
    This happens because in S3 there are no folders nor files. Everything is an object. So your `madl-temp/` is also an object, just like `madl-temp/file.parquet` "file". – Marcin Oct 06 '21 at 07:37
  • Okay understood. – whatsinthename Oct 06 '21 at 07:39

1 Answers1

3

Your bucket name is madl-temp and prefix is maxValue. But in boto3, you have the opposite. So it should be:

s3 = boto3.client('s3')
object_listing = s3.list_objects_v2(Bucket='madl-temp',
                                    Prefix='maxValue/')

To get the number of files you have to do:

len(object_listing['Contents']) - 1

where -1 accounts for a prefix maxValue/.

Marcin
  • 215,873
  • 14
  • 235
  • 294
  • It worked but when I am trying to find the `len` of `object_listing` it is returning the length of the prefix i.e 8. But I want 0 since there are no files under `maxValue` – whatsinthename Oct 05 '21 at 11:02
  • @Debuggerrr The proper where to check it is `len(object_listing['Contents'])`. – Marcin Oct 05 '21 at 11:05
  • Cool but still it is giving `1` as the value but there are no files under `maxValue`. Why is it so? – whatsinthename Oct 05 '21 at 11:14
  • @Debuggerrr You have to provide full output of `object_listing`. I don't know what it is, and your question does not provide such information either. – Marcin Oct 05 '21 at 11:18
  • In the post, I have mentioned that `"to check whether there are any parquet files are not"` and even if I try to display `object_listing['Contents']` it is still giving me `1`. How do I print the value? – whatsinthename Oct 05 '21 at 12:50
  • Try `print(object_listing)` and see what it returns to you. – John Rotenstein Oct 05 '21 at 21:50
  • not sure if it's default behavior but sometimes the prefix itself is returned. I usually look for it the list and remove it. You may also want to add Delimiter='/' argument and see if that changes the response – Jonathan Leon Oct 06 '21 at 03:50
  • okay I can try adding delimiter option – whatsinthename Oct 06 '21 at 07:30