1

I have a dynamodb table from which I want to delete a large no of items. I found a stackoverflow answer to a similar question where you scan the whole table to collect all the relevant items, then delete them in batch. but in my case the items are too many that they wouldnt fit in memory.

What are the possible solutions in such case?

  1. Scan the table using lastEvaluatedKey and each time delete 'x' number of items(say 25 or 100). This would only require one scan but is this a valid solution? Does deleting item(s) have any affect on the lastEvaluatedKey for the next iteration?
  2. Scan multiple times and delete 'x' number of items without using lastEvaluatedKey. This would require many full-table scans. It is definitely a valid solution but i want to avoid it.
LoL
  • 41
  • 5

1 Answers1

0

Option 1 should work..

You pass the LastEvaluatedKey returned to you in as ExclusiveStartKey

Note the word "Exclusive", the scan start with the next key greater than the value passed in as ExclusiveStartKey.

You could even use this option in a parallel scan, from the docs

In a parallel scan, a Scan request that includes ExclusiveStartKey must specify the same segment whose previous Scan returned the corresponding value of LastEvaluatedKey.

Going forward, consider the setting up a Time-To-Live (TTL)

This will allow DDB to automatically delete items for you. The best part, is those deletes cost you nothing! Unlike your current plan for which you will pay to read the item and again to delete it.

Charles
  • 21,637
  • 1
  • 20
  • 44
  • thanks for answer. Can you tell me why option 1 should work? when we delete(delete is a write operation in dynamodb) items into table, is there a guarantee that the iteration order doesnt change? if we write items while iterating using LastEvaluatedKey, some items could be inserted in the beginning when we mightve already iterated over half the items in the original order. aws doesnt say that its a chronological order. since delete is write, im wondering if it has the same affect on the iteration order. – LoL Jul 30 '21 at 15:19
  • If items are inserted while the delete is going on and they have a lower key, then no, they will not be picked up. – Charles Jul 30 '21 at 15:51
  • if the iteration order is only based on the key comparison, then Option 1 should work in my case(just deletion) as you said. i will test and see if it works. when you say lower/higher key, what is the comparison criteria? i know it doesnt matter, just curious as i cant find anything in docs. – LoL Jul 30 '21 at 17:13
  • key comparison depends on the data type of the key attributes, number, string, binary. See the [data types](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes) for documentation of how DDB compares them – Charles Jul 30 '21 at 17:32