DynamoDB schema design to support lookup by ID and timestamp of item?

Question

I need to design a DynamoDB schema to store items whose attributes are:

tid: a UUID string which is a unique identifier of the item
timestamp: an ISO-8601-formatted string representing a date and time related to the item
Other stuff...

and for the following query (really want to avoid having to do any scans) patterns:

Query by tid
Query by exact timestamp, and by relational ordering expressions (e.g., <=, BETWEEN, etc.) on timestamp. I.e., query all items from a certain date-time range without knowing their tids in advance.

Is this possible to do efficiently in DynamoDB, or is there perhaps another AWS solution that would serve me better?

jarmod · Answer 1 · 2019-04-15T23:24:23.240

Given a DynamoDB table as follows:

partition key: tid, type string
sort key: timestamp, type string

You can query on:

tid = 5
tid = 5, timestamp between 2018-12-21T09:00:00Z and 2018-12-21T15:00:00Z

Try it out using the awscli, for example to query all items with tid=5:

aws dynamodb query \
    --table-name mytable \
    --key-condition-expression "tid = :tid" \
    --expression-attribute-values '{":tid":{"S":"5"}}'

To query all items for tid=5 and timestamp between 09:00 and 15:00 on 2015-12-21:

aws dynamodb query \
    --table-name mytable \
    --key-condition-expression "tid = :tid AND #ts BETWEEN :ts1 AND :ts2" \
    --expression-attribute-values  '{":tid":{"S":"5"}, ":ts1":{"S":"2015-12-21T09:00:00Z"}, ":ts2":{"S":"2015-12-21T15:00:00Z"}}' \
    --expression-attribute-names '{"#ts":"timestamp"}'

Note: because timestamp is a reserved keyword in DynamoDB, you have to escape it using the expression attribute names.

You could also create the timestamp attribute as a number and then store epoch times, if you prefer.

To query all items with timestamp between 09:00 and 15:00 on 2015-12-21, regardless of tid, cannot be done with the same partition/sort key schema. You would need to add a Global Secondary Index something like this:

GSI partition key: yyyymmdd, type string
GSI sort key: timestamp, type string

Now you can query for items with a given timestamp range, as long as they're on the same day (they have the same YYYYMMDD, which might be a reasonable restriction). Or you could go to YYYYMM as the partition key allowing a wider timestamp range. At this point you really need to understand the use cases for queries to decide if YYYYMMDD (restricting queries to a single day) is right. See How to query DynamoDB by date with no obvious hash key for more on this idea.

The second query pattern I was referring to was querying by timestamp only, i.e., not having to specifying a `tid`. And since `tid` is a globally unique identifier, a composite key with `tid` as the partition key and some other key as the sort key is redundant, as it's equivalent to a simple primary key consisting of just the partition key `tid`. What I'm looking for is a structure supporting lookups by `tid`, as well as lookups of all items belonging to a certain date-time range (without knowing their `tid`s in advance). — , Apr 11 '19 at 23:24
Apologies, misread that part of the question. The typical solution here afaik uses a GSI with a partition key that can be inferred/calculated from the timestamp and then a sort key which is the timestamp. The partition key could be something like the YYYYMMDD of the timestamp. Then you can query for items with a given timestamp range, as long as they're on the same day (have the same YYYYMMDD, which might be a reasonable restriction). Or you could go to YYYYMM as the partition key allowing a wider timestamp range, but for high i/o tables that's not ideal for sharding. Does that help? — jarmod, Apr 11 '19 at 23:59

DynamoDB schema design to support lookup by ID and timestamp of item?

1 Answers1