There is this S3 notification feature described here:
Amazon S3 event notifications are designed to be delivered at least once. Typically, event notifications are delivered in seconds but can sometimes take a minute or longer.
and discussed here.
I thought I could mitigate the duplications a bit by deleting files I have already processed. The problem is, when a second event to the same file comes (a minute later) and I try to access the file, I don't get an HTTP 404, I get an ugly AccessDenied:
[ERROR] ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 111, in lambda_handler
raise e
File "/var/task/lambda_function.py", line 104, in lambda_handler
response = s3.get_object(Bucket=bucket, Key=key)
File "/var/runtime/botocore/client.py", line 391, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 719, in _make_api_call
raise error_class(parsed_response, operation_name)
which is unexpected and not acceptable.
I don't want my lambda to suppress AccessDenied errors for obvious reasons. Is there an easy way to find out if the file has been already processed in the past or if notification service is playing tricks?
EDIT:
For those who think this is "an indication of some bug in my application" here the relevant piece of code:
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
logger.info(f'Requesting file from bucket {bucket} with key {key}')
try:
response = s3.get_object(Bucket=bucket, Key=key)
except ClientError as e:
error_code = e.response["Error"]["Code"]
if error_code == 'NoSuchKey':
logger.info('Object does not exist any more')
return
else:
raise e
It rather smells like an ugly issue on AWS side to me.