2

I need to design a DynamoDB schema to store items whose attributes are:

  • tid: a UUID string which is a unique identifier of the item
  • timestamp: an ISO-8601-formatted string representing a date and time related to the item
  • Other stuff...

and for the following query (really want to avoid having to do any scans) patterns:

  • Query by tid
  • Query by exact timestamp, and by relational ordering expressions (e.g., <=, BETWEEN, etc.) on timestamp. I.e., query all items from a certain date-time range without knowing their tids in advance.

Is this possible to do efficiently in DynamoDB, or is there perhaps another AWS solution that would serve me better?

1 Answers1

1

Given a DynamoDB table as follows:

  • partition key: tid, type string
  • sort key: timestamp, type string

You can query on:

  • tid = 5
  • tid = 5, timestamp between 2018-12-21T09:00:00Z and 2018-12-21T15:00:00Z

Try it out using the awscli, for example to query all items with tid=5:

aws dynamodb query \
    --table-name mytable \
    --key-condition-expression "tid = :tid" \
    --expression-attribute-values '{":tid":{"S":"5"}}'

To query all items for tid=5 and timestamp between 09:00 and 15:00 on 2015-12-21:

aws dynamodb query \
    --table-name mytable \
    --key-condition-expression "tid = :tid AND #ts BETWEEN :ts1 AND :ts2" \
    --expression-attribute-values  '{":tid":{"S":"5"}, ":ts1":{"S":"2015-12-21T09:00:00Z"}, ":ts2":{"S":"2015-12-21T15:00:00Z"}}' \
    --expression-attribute-names '{"#ts":"timestamp"}'

Note: because timestamp is a reserved keyword in DynamoDB, you have to escape it using the expression attribute names.

You could also create the timestamp attribute as a number and then store epoch times, if you prefer.

To query all items with timestamp between 09:00 and 15:00 on 2015-12-21, regardless of tid, cannot be done with the same partition/sort key schema. You would need to add a Global Secondary Index something like this:

  • GSI partition key: yyyymmdd, type string
  • GSI sort key: timestamp, type string

Now you can query for items with a given timestamp range, as long as they're on the same day (they have the same YYYYMMDD, which might be a reasonable restriction). Or you could go to YYYYMM as the partition key allowing a wider timestamp range. At this point you really need to understand the use cases for queries to decide if YYYYMMDD (restricting queries to a single day) is right. See How to query DynamoDB by date with no obvious hash key for more on this idea.

jarmod
  • 71,565
  • 16
  • 115
  • 122
  • The second query pattern I was referring to was querying by timestamp only, i.e., not having to specifying a `tid`. And since `tid` is a globally unique identifier, a composite key with `tid` as the partition key and some other key as the sort key is redundant, as it's equivalent to a simple primary key consisting of just the partition key `tid`. What I'm looking for is a structure supporting lookups by `tid`, as well as lookups of all items belonging to a certain date-time range (without knowing their `tid`s in advance). –  Apr 11 '19 at 23:24
  • Apologies, misread that part of the question. The typical solution here afaik uses a GSI with a partition key that can be inferred/calculated from the timestamp and then a sort key which is the timestamp. The partition key could be something like the YYYYMMDD of the timestamp. Then you can query for items with a given timestamp range, as long as they're on the same day (have the same YYYYMMDD, which might be a reasonable restriction). Or you could go to YYYYMM as the partition key allowing a wider timestamp range, but for high i/o tables that's not ideal for sharding. Does that help? – jarmod Apr 11 '19 at 23:59