0

I have a DynamoDB table that has

  • partition key "idA", sort key "idB"
  • GSI partition key "idB", sort key "idA"

I am attempting to delete all items with specific "idB", so I query the GSI to get a list of records, but also want to paginate results for scale.

If I was querying against the main index I could probably simply re-run the query with a limit and delete each record but because I need to use the GSI this results in records that have been deleted showing up in subsequent queries due to GSI's eventual consistency, ie a records deletion often does not propogate to the GSI before the next query is invoked.

Another path is to use the LastEvaluatedKey from the previous query response as the ExclusiveStartKey for the next query which should result in only new records being returned, however there is a fair chance the record LastEvaluatedKey points too will no longer exist due to being deleted in the previous iteration.

When that happens weird results are returned, I thought it should work because on a main index if you send a non-existent ExclusiveStartKey Dynamo can still figure out where it should start retrieving records from, due to it's ordering system.

But on the GSI the results often start with the next expected record, but then some records might get skipped, and often the non-existent ExclusiveStartKey query will not return a LastEvaluatedKey, even though not all remaining records have been returned.

I am playing with ideas to handle this strange behaviour:

  1. not deleting the last record return to decrease the chance ExclusiveStartKey does not exist (and deleting it's record after the next iteration)
  2. doing an extra check when no LastEvaluatedKey is returned to make sure no records actually remain

But they are messy workarounds.

Anybody understand why the weird results happen, does it happen in any structured way?

Any other advice how to solve the task I am performing?

1 Answers1

1

DynamoDB GSIs behave just like the base table and a LEK is a pointer to a position on a partition, it does not need the item to exist to understand where to start the next iteration.

Ensure you are not corrupting the ESK and that you pass it to the next Query exactly as returned as an LEK, it should include GSI keys as well as base table keys.

If you still see an issue after that, please share code.

Leeroy Hannigan
  • 11,409
  • 3
  • 14
  • 31
  • I believe what happened is the records were created in logic called recently before the query+deletions, I thought the time it took AWS to do extra logic, pass through messaging etc would be long enough for the DynamoDB GSI to update from the base table, but that is not something we can rely on. I will redesign my data to not use a GSI for work that requires up to date querying. – SvenTheBarbarian Dec 25 '22 at 01:52