3

We plan to use AWS SQS service to queue events created from web service and then use several workers to process those events. One event can only be processed one time. According to AWS SQS document, AWS SQS standard queue can "occasionally" produce duplicated message but with unlimited throughput. AWS SQS FIFO queue will not produce duplicated message but with throughput limitation of 300 API calls per second (with batchSize=10, equivalent of 3000 messages per second). Our current peak hour traffic is only 80 messages per second. So, both are fine in terms of throughput requirement. But, when I started to use AWS SQS FIFO queue, I found that I need to do extra work like providing extra parameters "MessageGroupId" and "MessageDeduplicationId" or need to enable "ContentBasedDeduplication" setting. So, I am not sure which one is a better solution. We just need the message not duplicated. We don't need the message to be FIFO.

Solution #1: Use AWS SQS FIFO queue. For each message, need to generate a UUID for "MessageGroupId" and "MessageDeduplicationId" parameters.

Solution #2: Use AWS SQS FIFO queue with "ContentBasedDeduplcation" enabled. For each message, need to generate a UUID for "MessageGroupId".

Solution #3: Use AWS SQS standard queue with AWS ElasticCache (either Redis or Memcached). For each message, the "MessageId" field will be saved in the cache server and checked for duplication later on. Existence means this message has been processed. (By the way, how long should the "MessageId" exists in the cache server. AWS SQS document does not mention how far back a message could be duplicated.)

Raymond
  • 115
  • 2
  • 11
  • I'm not sure any of those are "better". The simplest would definitely be to use the FIFO queue with ContentBasedDeduplcation enabled. Pick the solution you are most comfortable with. – Mark B Sep 26 '17 at 16:53
  • SQS with FIFO guarantees no duplicate message within a certain timeframe (probably 5 min) so beyond that if there are duplicate entries come there you will get the duplicate message. So you need to address this with the design. – Avishek Bhattacharya Sep 26 '17 at 17:29
  • Producers (web service) normally finish within several hundred milliseconds. So, 5 min is good enough? Basically, I am trying to see if it's better to do the "DeDuplication" in our code logic/cache server or relying on FIFO queue? – Raymond Sep 26 '17 at 19:13
  • In the unusual case where a message is delivered more than once in a non-FIFO queue, the ReceiptHandle will almost certainly **not** be the same, but the MessageId would be the same. – Michael - sqlbot Sep 26 '17 at 20:17
  • Thanks. update my question to use "MessageId" for deduplication instead. – Raymond Sep 26 '17 at 20:31
  • 1
    See [my answer to this related question](https://stackoverflow.com/a/38290017/836214) -- "You're asking for a guarantee - you won't get one. You can reduce probability of a message being processed more than once to a very small amount, but you won't get a guarantee.", along with detailed explanation of what you can do. Hope it helps. – Krease Oct 10 '17 at 23:59

2 Answers2

0

You are making your systems complicated with SQS.

We have moved to Kinesis Streams, It works flawlessly. Here are the benefits we have seen,

  1. Order of Events
  2. Trigger an Event when data appears in stream
  3. Deliver in Batches
  4. Leave the responsibility to handle errors to the receiver
  5. Go Back with time in case of issues Buggier Implementation of the process
  6. Higher performance than SQS

Hope it helps.

Kannaiyan
  • 12,554
  • 3
  • 44
  • 83
  • Thanks for quick response. Kinesis Streams seems interesting. But, in our simple case, we just use SQS as buffer for batch processing off-line as we can't process all requests soon enough (there are several 3rd party api calls). For simple use case and cost, SQS seems more suitable [link](https://stackoverflow.com/questions/26623673/why-should-i-use-amazon-kinesis-and-not-sns-sqs). Beside, Kinesis also has duplicated record issue [link](http://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-duplicates.html) – Raymond Sep 26 '17 at 19:06
0
  • My first question would be that why is it even so important that you don't get duplicate messages? An ideal solution would be to use a standard queue and design your workers to be idempotent. For e.g., if the messages contain something like a task-ID and store the completed task's result in a database, ignore those whose task-ID already exists in DB.
  • Don't use receipt-handles for handling application-side deduplication, because those change every time a message is received. In other words, SQS doesn't guarantee same receipt-handle for duplicate messages.
  • If you insist on de-duplication, then you have to use FIFO queue.
ketan vijayvargiya
  • 5,409
  • 1
  • 21
  • 34