4

I want to implement numeric limit and offset based pagination in DynamoDB similar to Postgres.

My API look something like this: http://foo.bar/user?offset=50&limit=20.

What's the best way to do this in Java without risking OutOfMemoryError considering DynamoDB uses ExclusiveStartKey and LastEvaluatedKey to paginate?

EDIT:

Let's assume offset based pagination is a hard-requirement and I don't know the 'prior' page. My API contract have offset and limit query params as described above. I'm not looking for "don't do offset based pagination" answers.

gerrytan
  • 40,313
  • 9
  • 84
  • 99
  • 1
    Offset based pagination (0, 20, 40, 60. ...) for 20 records _versus_ a search for the last key on prior page + 21 records. The latter can be much faster. _(Just averting about the latter smart pagination technique)_ – Joop Eggen Sep 20 '19 at 11:32
  • Thanks @joop-eggen, but let's assume offset based pagination is a hard-requirement here (I'll add an EDIT to clarify). – gerrytan Sep 20 '19 at 11:37
  • Also @JoopEggen, let's assume I don't know the prior page, I don't keep that state – gerrytan Sep 20 '19 at 11:58
  • (It is indeed a technique to overcome a slow db offset access, requiring holding and juggling with data. Not something to put in a code design.) – Joop Eggen Sep 20 '19 at 12:17
  • 1
    Ah I see what you mean now @JoopEggen, I can have a separate 'jump table' holding the start key for each 'bucket'. Now if I need to deep-page, I can jump to the nearest 'bucket' instead of having to traverse from the start. But this require me to maintain this jump table somehow when the source table is updated. It's an interesting idea. I wonder if there's a library / pattern that can help implementing this. – gerrytan Sep 20 '19 at 12:32
  • It can be done lazily; I do not know which pagination libraries use it Encountering it here (?) was a suprise. A cache of page keys, possibly reduced to pages 100, 200, 300, ... or such is imaginable. I wish I new a term for this technique to search for. – Joop Eggen Sep 20 '19 at 13:01

1 Answers1

4

After looking at how PaginatedList work in the sdk, it seems the most efficient way is to use ITERATION_ONLY pagination loading strategy in the config, and do something like this:

DynamoDBMapperConfig config = new DynamoDBMapperConfig.Builder()
                       .withPaginationLoadingStrategy(ITERATION_ONLY)
                       .build();
PaginatedQueryList<MyUser> users = dynamoDBMapper.query(MyUser.class, myQueryExpression, config);
// This iterator will fetch 1MB worth of data 'on-demand'
Iterator<MyUser> userIterator = users.iterator();
skip(iterator, offset);
List<MyUser> aPageOfUser = collect(iterator, limit);

Of course I'll incur the cost of implementing my own skip and collect method there. Is there a better / more-concise way to do this?

EDIT:

It seems I can utilise IteratorUtils from commons-collections to have the skip and collect for free:

Iterator<MyUser> userIterator = IteratorUtils.boundedIterator(users.iterator(), offset, limit);
List<MyUser> aPageOfUser = IteratorUtils.toList(userIterator);

gerrytan
  • 40,313
  • 9
  • 84
  • 99