0

The documentation for the Standard SQS queue on AWS says that it may occasionally deliver a message twice.

Whats the ideal way to check for this? I basically have a lambda setup which is activate by items going into the queue. Some calculation is done on the item, and the data written back into a DB.

Is it sufficient to check if that data has been written to the DB before writing it again on the chance the message had already been delivered prior?

Or is there a fancier way to do this?

Is there anyway to get a FIFO queue to feed into lambda?

rygo6
  • 1,929
  • 22
  • 30
  • Possible duplicate of [Using many consumers in SQS Queue](https://stackoverflow.com/questions/37472129/using-many-consumers-in-sqs-queue) – dmulter Sep 13 '18 at 15:33

3 Answers3

2

There are some options as follows

  1. The most obvious is about having some unique identifier in the messages and then storing those in some persistent mechanism (ideally DynamoDB) to check against before processing each message. So that you know if this message is already processed or not. In case if this is the route that you decide upon then you can have that identifier as part of message attributes rather than the message body so that you don't have to parse the entire body to see if this is a duplicate or not
    • Pros : Processing of mesages is Real time
    • Cons : You have the overhead of Persisting the IDs and de-duplication at your end
  2. Second Option is to use the FIFO queues and then have a scheduled Lambda (using AWS Cloudwatch Alarms) polls FIFO queues at the specified schedule and if messages are present then process them
    • Pros : You save the overhead of Persisting the IDs and de-duplication at your end
    • Cons : Not real time
  3. Third fancy option (just because you asked for a fancier option) is to have 2 SQS queues (1 Standard and other FIFO) and have your message producer put messages in both the SQS queues. Now have you Lambda trigger based on the Standard Queue but when the Lambda gets invoked read the messages from FIFO Queue. That way if Lambda gets triggered for duplicate messages then for that Lambda invocation nothing would be available in the FIFO queue and you dont do any processing
    • Pros : Processing of messges is Real Time you don't have hassles of maintaining the unique IDs
    • Cons : 2 queues
Arafat Nalkhande
  • 11,078
  • 9
  • 39
  • 63
1

For a problem like this, you can try many workarounds from your services like checking for duplicate message_ids or maintaining two queues for this purpose. All of these seem legit, but consumes additional processing power. A good solution would be to use an internal functionality of AWS SQS itself. But still, it might not be enough to cater for our requirements. Given below are a few approaches that can be used for this purpose.

  1. SQS Standard Queue + Lambda + Database

This is the approach you have suggested where we check the database for processed message_ids and make sure not to process the same message twice. Make sure you add an index for the message_id column for faster checks.

  1. Message Publisher + SQS Standard Queue + Lambda + Database

Here you can ask your message publisher to ensure that duplicate messages are not being sent to SQS. This is only possible only if you maintain your own publishing service. This could be the ideal solution if you have access to it.

  1. SQS Standard Queue + EC2 + Database

You can use an EC2 instance, instead of a lambda, so that you can save the already processed message_id s within the EC2. This will save database I/O operations, whenever a message is received. The drawback is that you have to use polling and EC2 costs way more than using a Lambda.

  1. SQS FIFO Queue + Lambda(or EC2) + Database + Polling

You can use the FIFO Queue and enforce exactly once processing, to ensure duplicate messages are not being sent to SQS. This involves a Lambda (using CloudWatch) or and EC2 instance polling for messages. This might be performance intensive, but we can enforce our requirement.

As of yet, lambda triggering is only supported in SQS standard queues. Hence, going for FIFO will not be an option. If we look at a practical perspective, option number two would be the ideal solution. It's much easier and clean, rather than making the entire architecture a spaghetti. Hope this helps.

Keet Sugathadasa
  • 11,595
  • 6
  • 65
  • 80
0

I faced a similar issue, and was able to address this problem by verifying that in dynamoDB that the unique message identifier was already present. If it was already present, the data would not be processed. If the key was not already present, it would be stored in dynamo. To this point, you can then use AWS dynamo DB streams to do whatever processing needs to be done after dynamo gets saved with the new key via AWS lambda.

SnG
  • 362
  • 1
  • 5
  • 18