0

I want to process a DynamoDB stream to make updates into a different DynamoDB in another account. The schemas are different so I will be transforming the data in between as well.

I have thought of the below solution:

  1. Enable DynamoDB streams on the source table.
  2. Process the stream in Lambda. As per my understanding DynamoDB streams offer ordered events for per shard.
  3. Apply updates in destination DynamoDB using the DynamoDB record.

I want to apply the updates on the destination DynamoDB in the same order in which they were realized in the source DynamoDB.

I was reading the documentation and it said that multiple shards can process the updates in the same partition and that a lambda invocation is triggered for each shard in parallel (Assuming Parallelization factor is 1 per shard). So then how can I ensure the records for each partition across shards is processed in order?

My solution:

I was thinking of including some kind of a counter for each update to item in source DynamoDB and use some global variable across lambda invocations to handle updates in order the case when items for the same partition key would be processed across different lambda invocations by virtue of them being in different shards.

I think there should be another way to this maybe with update timestamps? Are there any other better/cleaner ways to do this? Also please feel free to correct me if I have misunderstood anything.

bazzi
  • 559
  • 1
  • 8
  • 23

1 Answers1

1

Just use Timestamp ordering. When writing items to destination, add a condition to the write this_timestamp > existing_timestamp.

DynamoDB streams has an ApproximateCreationDateTime however, its rounded to the nearest second and may not be granular enough for you use-case, so its best to implement your own. Ensure that you design elegantly to avoid clock-skew etc.. causing issues with your data consistency.

Leeroy Hannigan
  • 11,409
  • 3
  • 14
  • 31
  • Thanks for the reply. Is the understanding that all records for a particular item will belong to the same shard correct? If that's the case I believe what you suggested should work! – bazzi Jul 09 '23 at 09:45
  • DynamoDB streams guarantees item level ordering, that is, all modifications to an item (not all items in a collection) will be in order and contain bo duplicates. – Leeroy Hannigan Jul 09 '23 at 10:33
  • Thanks for the reply. My source table has both partition and sort keys. So updates to a particular item will belong to the same shard? I was wanting to use Lambda as a consumer of the DynamoDB streams. And if updates to same item are split across shards then simply using a timestamp will not work. As per [this](https://stackoverflow.com/a/44291339/4039495) though I believe my assumption is correct. Is that understanding correct? – bazzi Jul 09 '23 at 11:10
  • Item level ordering means your guaranteed ordering for an item that has the same PK and SK. Timestamp ordering will work across shards, you will just do conditional puts to put destination table. – Leeroy Hannigan Jul 09 '23 at 12:25
  • Does this guaranteed ordering hold true when we are consuming the stream using Lambda which consumes each shard in a separate instance? I am not sure how timestamp ordering will work across shards. Say the latest update to an item is at timestamp A and two future updates, at time B and C (C>B), on the same item go into two different shards. How can we guarantee that we should apply B before C as both B and C are after A. – bazzi Jul 09 '23 at 13:37
  • 1
    You cant guarantee that you apply B before C. But you can guarantee that C will be the last state of the item. If you can't deal with eventual consistency then perhaps you can use Transactions and write to two tables at the same time. – Leeroy Hannigan Jul 09 '23 at 13:48
  • Ah makes sense. We can directly apply C and ignore B. I think our use-case will be fine with eventual consistency. Thanks for all your support! – bazzi Jul 09 '23 at 14:27