0

I have a dynamodb table with the following structure

userId -> hashkey
timestamp -> sortkey
//other fields

I have two primary queries I want to execute on this table.

  1. Get all entries for a user -> This is straight forward, as I can do query on the hashkey
  2. Get recent N entries -> Not sure what is the best way to do this. I will be displaying these in UI so I need to do paging. Should I use SCAN? or Create a GSI on timestamp and use Query?

Any other suggestions? Thank you.

Update 05/14 There are great suggestions posted here. I am still trying to figure how best to implement this and wanted to check the pros/cons of the following approach.

Trips: 
  Id -> this is a GUID generated by the API that is inserting data to dynamo
  timestamp -> sortkey
  //other details

UserTrips:
  userId -> hashkey
  tripId -> id from trips table

RecentTrips:
  timeKey -> YYYYMM format hash key
  tripId -> id from trips table

I am afraid that this might create lot of inconsistencies as if insertion to usertrips or recenttrips fail etc.

user1868744
  • 963
  • 1
  • 13
  • 27
  • The typical solution is a GSI with something like YYMMDD as the partition key, the timestamp as the sort key, and then you can use query, but only within a given day. Example: https://stackoverflow.com/questions/35963243/how-to-query-dynamodb-by-date-range-key-with-no-obvious-hash-key – jarmod May 13 '20 at 22:31
  • I need recently created N records, should not be restricted by date. – user1868744 May 13 '20 at 23:11
  • You can't, generally speaking, do this with a NoSQL database. You could make the partition key YYYYMM and get the latest N records of the current month, but it wouldn't work correctly in the early hours of the first day of a new month (you'd have to make two queries for latest N this month and latest N last month, and aggregate the results - not difficult to do, but mildly painful. Another alternative is to keep some kind of index to the latest records in a secondary table, maintained through a DynamoDB Streams trigger. – jarmod May 13 '20 at 23:17

2 Answers2

0

You didn't specify which aws sdk you are using, but for example in Java you would solve number 2 by doing a query on the hash key, with range key condition that its value is less than value of last item you loaded. You also have to specify descending order, since default is ascending for ddb queries. In order to do the paging you just put a limit on query in order to get maximum of number of items that you specify.

So you basically:

  • Set lastLoadedItemTimestamp to current timestamp
  • Do query for 10 items with hash key A and condition that range key (i.e. timestamp) is less than lastLoadedItemTimestamp
  • Set lastLoadedItemTimestamp to range key of last item that was loaded
  • If query returned 10 items you repeat the two steps above, in a loop until query returns less than 10 items
MarkoR
  • 543
  • 4
  • 12
  • The recent N query almost certainly isn't "give me the last N records for userid X". It's "give me the last N records across all users". – jarmod May 13 '20 at 22:34
  • @jarmod you are correct, I need it across all users – user1868744 May 13 '20 at 23:09
  • @user1868744 Oh, sorry, I somehow assumed you need last N entries for the user, since the question was basically: "I know how to get all entries for the user, but how to get last N entries by the timestamp" – MarkoR May 14 '20 at 23:57
0

I answered a similar question on this topic yesterday, so will link my answer here in case it's useful: https://stackoverflow.com/a/61783992/11986130

In short, I decided to use year as my partition key so that I could use queries (if I'd had a lot of data, maybe month or something else would've been more appropriate of course).

Hope it helps!

EDIT: this approach is not necessarily without issues as you risk lower performance due to "hot partitions" -- may be worth looking at this answer too: https://stackoverflow.com/a/36491475/11986130

miksimal
  • 26
  • 3
  • Are you recommending I should use year or yearmonth as GSI, as I need userId as partition key to address my first use case? – user1868744 May 14 '20 at 16:29
  • That should be doable, yes. In fact, the GSI scenario in the docs isn't too far from what it seems you need (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html). – miksimal May 14 '20 at 21:01