0

I have a path like below:

s3://edl-landing/lu/hello2/  

Under which I have two tables as shown below:

Path
Now, each table has parquet data in the below folders format:

format

Now, I want to delete the data for 10th and 11th November. So, below is my python code using the boto client but it doesn't delete the objects from S3. I don't get any errors. Also, I tried various solutions from this link but it also doesn't delete the actual data from S3.

prefix = "lu/hello2/"
s3 = boto3.resource('s3')
bucket = s3.Bucket(name="edl-landing")
FilesNotFound = True
blankList=[]
for obj in bucket.objects.filter(Prefix=prefix):
     #print(obj.key)
     blankList.append(obj.key.split('/')[2])
blankList = set(blankList) // 2 names
while ("" in blankList):
  blankList.remove("")
datesList = ['2020-11-10','2020-11-11']
for i in blankList:
  for j in datesList:
    path = "s3://edl-landing/lu/hello2/"+i+"/edl_load_ts="+j+"/"
    print(path)
    bucket.objects.filter(Prefix=path).delete()
    print("All the objects have been deleted for the mentioned dates...")

Where am I going wrong? I am running it via an EC2 instance.

whatsinthename
  • 1,828
  • 20
  • 59

1 Answers1

0

You are passing a complete S3 URI to bucket.objects.filter(Prefix=path).

This function requires an S3 object prefix, not an S3 URI.

You should have been able to verify that your code was attempting to delete zero objects because bucket.objects.filter(Prefix=path) returned zero objects.

Pass the relevant prefix: "lu/hello2/"+i+"/edl_load_ts="+j+"/"

jarmod
  • 71,565
  • 16
  • 115
  • 122