For a problem like this, you can try many workarounds from your services like checking for duplicate message_ids or maintaining two queues for this purpose. All of these seem legit, but consumes additional processing power. A good solution would be to use an internal functionality of AWS SQS itself. But still, it might not be enough to cater for our requirements. Given below are a few approaches that can be used for this purpose.
- SQS Standard Queue + Lambda + Database
This is the approach you have suggested where we check the database for processed message_ids and make sure not to process the same message twice. Make sure you add an index for the message_id column for faster checks.
- Message Publisher + SQS Standard Queue + Lambda + Database
Here you can ask your message publisher to ensure that duplicate messages are not being sent to SQS. This is only possible only if you maintain your own publishing service. This could be the ideal solution if you have access to it.
- SQS Standard Queue + EC2 + Database
You can use an EC2 instance, instead of a lambda, so that you can save the already processed message_id s within the EC2. This will save database I/O operations, whenever a message is received. The drawback is that you have to use polling and EC2 costs way more than using a Lambda.
- SQS FIFO Queue + Lambda(or EC2) + Database + Polling
You can use the FIFO Queue and enforce exactly once processing, to ensure duplicate messages are not being sent to SQS. This involves a Lambda (using CloudWatch) or and EC2 instance polling for messages. This might be performance intensive, but we can enforce our requirement.
As of yet, lambda triggering is only supported in SQS standard queues. Hence, going for FIFO will not be an option. If we look at a practical perspective, option number two would be the ideal solution. It's much easier and clean, rather than making the entire architecture a spaghetti. Hope this helps.