We have been using AWS S3 notifications to trigger lambda functions when files land on S3 and this model has worked reasonably well until we noticed that some files are processed multiple times, generating duplicates in our datastore. We noticed that it happened for about 0.05% of our files.
I know can guard against this by performing an upsert, but what is of concern to us is the potential cost of running unnecessary lambda functions, as this impacts our cost.
I've searched Google and SO, but only found similar-ish issues. We are not having a timeout problem, as the files have been processed fully. Our files are rather small, with the biggest file being less than 400k. We are not receiving the same event twice, as the events have different request ids, even though they are running on the same file.