1

AWS newbie here.

I have a DynamoDB table and 2+ nodes of Java apps reading/writing from/to it. My use case is as follow: the app should fetch N numbers of items every X seconds based on a timestamp, process them, then remove them from the DB. Because the app may scale, other nodes might be reading from the DB in the same time and I want to avoid processing the same items multiple times.

The questions is: is there any way to implement something like a poll() method that fetches the item and immediately removes it (atomic operation) as if the table was a queue. As far as I checked, delete item methods that DynamoDBMapper offers do not return removed items data.

2 Answers2

1

Consistency is a weak spot of DDB, but that's the price to pay for its scalability.

You said it yourself, you're looking for a queue, so why not use one?

I suggest:

  1. Create a lambda that:
    • Reads the items
    • Publishes them to an SQS FIFO queue with message deduplication
    • Deletes the items from the DB
  2. Create an EventBridge schedule to run the Lambda every n minutes
  3. Have your nodes poll that queue instead of DDB

For this to work you have to consider a few things regarding timings:

  1. DDB will typically be consistent in under a second, but this isn't guaranteed.
  2. SQS deduplication only works for 5 minutes.
  3. EventBridge only supports minute level granularity, not seconds.

So you can run your Lambda as frequently as once a minute, but you can run your nodes as frequently (or infrequently) as you like.

If you run your Lambda less frequently than every 5 minutes then there is technically a chance of processing an item twice, but this is very unlikely to ever happen (technically this could still happen anyway if DDB took >10 minutes to be consistent, but again, extremely unlikely to ever happen).

Richard Dunn
  • 6,165
  • 1
  • 25
  • 36
  • Thanks for the answer, appreciate it! There are many events coming to the system and we should have throttling in place so the events are evenly processed (per event type). So, for example if the rate limit is 5 events per type per minute and there are more incoming, we want to postpone processing them but accept other types in the meantime. That's why we decided to store them in dynamodb and poll evenly per event type (partition key). If we were to have a queue only we'd waste resources putting it back to the queue once rate limited and possibly end up with a "noisy neighbour" problem. – Kamil Kozlowski Sep 07 '22 at 13:52
  • Ok, there's a lot of requirements there that I'm not 100% clear on, but I can think of lots of ways to expand on my answer that would likely make it work. The first that comes to mind is: why not debounce the messages with the polling Lambda. It's purpose is to create a single source of truth for whether an item has been deleted or not. The Lambda checks if n time has passed since the oldest item was added to DDB before placing a batch of items on the queue and deleting. You can set the queues Group ID to the message type and batch read from your nodes to ensure they are consumed as a group. – Richard Dunn Sep 07 '22 at 14:30
  • In other words, poll every 5 min, if item is only 10 seconds old ignore until next poll. Poll again, item is now 5:10 old and there are 10 more items of same type. Place all 11 items on queue with same group id (or as many as are old enough to go through). – Richard Dunn Sep 07 '22 at 14:33
0

My understanding is that you want to read and delete an item in an atomic manner, however, we are aware that is not possible with DynamoDB.

However, what is possible is deleting the item and being returned the value, which is more likened to a delete then read. As you correctly pointed out, the Mapper client does not support ReturnValues however the low level clients do.

Key keyToDelete = new Key().withHashKeyElement(new AttributeValue("214141"));
DeleteItemRequest dir = new DeleteItemRequest()
    .withTableName("ABC")
    .withKey(keyToDelete)
    .withReturnValues("ALL_OLD");

More info here DeleteItemRequest

Leeroy Hannigan
  • 11,409
  • 3
  • 14
  • 31