Serverless: Define dynamoDB table for efficient scans across 2 range keys

Question

I'm trying to define a dynamodb table so that I can easily filter it based on multiple ranges (date and type neither are assumed to be unique). I ended up with the below schema (stolen from DynamoDB queries on secondary index, how to define the indexes), but it doesn't seem to be particularly efficient for the primary use case (returning data after filtering by 2 ranges).

Is it better to make type a range on the primary table and set date as a localSecondaryIndex. Or is there another method which keeps the flexibility associated with GSIs.

Properties:
  TableName: TableName
  AttributeDefinitions:
    - AttributeName: id
      AttributeType: S
    - AttributeName: date
      AttributeType: S
    - AttributeName: type
      AttributeType: S

  KeySchema:
    - AttributeName: id
      KeyType: HASH

  GlobalSecondaryIndexes:
    - IndexName: dateIndex
      KeySchema:
        - AttributeName: date
          KeyType: HASH
      Projection:
        ProjectionType: KEYS_ONLY
      ProvisionedThroughput:
        ReadCapacityUnits: 100
        WriteCapacityUnits: 100

    - IndexName: typeIndex
      KeySchema:
        - AttributeName: type
          KeyType: HASH
      Projection:
        ProjectionType: KEYS_ONLY
      ProvisionedThroughput:
        ReadCapacityUnits: 100
        WriteCapacityUnits: 100

  ProvisionedThroughput:
    ReadCapacityUnits: 100
    WriteCapacityUnits: 100

score 0 · Accepted Answer · answered Jul 17 '20 at 19:14

A full scan might be useful in some cases if you are retrieving a very large "Date" range and multiple different "Type". The attributes you are talking about, however, might be better retrieved using query operations and an appropriate GSI rather than a full scan.

Option 1) GSI with "Type" as Hash key, "Date" as Rangekey

This is a common solution for what you are trying to do. You would need to know the "Type" you are trying to lookup for the given date range. A key condition on the range key would be used to match your date range. If you need to query for multiple "Type" within a date range then you would do multiple query operations (one for each "Type").

Option 2) GSI on an aggregate column "TypeWithDate"

Create a GSI with new attribute "TypeWithDate" as the hash key which would be concatenation of "Type" and "Date" (eg. "mytype#2020-07-17"). Query the data directly for the matching keys. This works if you have a limited number "Date" ranges to search for, or if the date ranges you are looking up are fragmented. Simply iterate through them all.

Option 3) Client side filtering

Probably not as efficient, but would save you from creating additional GSI. You would reuse the "Date" GSI you have above to make multiple query calls (one for each day in "Date"), and then do client side filtering to get the "Type"

For all the options above, the query operations can return multiple records and as a result require multiple calls to paginate through all results. If your "Date" column contains time as well then options to query "Date" directly are still possible if you create a separate "Date" column without Time, or create a granularity that trades off how many individual keys you would query. eg. "2020-07-17T14:00:00" could be mapped to a date bucket on hour "2020-07-17.14", but that requires 24 query lookups per day for option 2 and 3.

Thanks, Option 1 makes the most sense for me right now. It seems to contain the same functionality as Option 2 without the need to concatenate as part of the request. — David Upsdale, Jul 20 '20 at 08:46

Serverless: Define dynamoDB table for efficient scans across 2 range keys

1 Answers1

Option 1) GSI with "Type" as Hash key, "Date" as Rangekey

Option 2) GSI on an aggregate column "TypeWithDate"

Option 3) Client side filtering