1

I am a beginner in NoSQL DB and Serverless. My app has a table called Trips. The parameters of the tables are {id, route, cost, selling, type, date, LR, asset } and a bunch of other irrelevant document numbers, where id is generated by uuid.

Now I want to query the database for giving me

  1. Return all trips between a date range using the date parameter.
  2. Return all the trips for an asset in given period of time using date and asset parameter.
  3. Return all the trips for a route in a given period of time using date route parameter.

2 and 3 work fine using keyConditionExpression but for 1 I need to use a filterExpression on a scan instead of a query which could make it relatively slower since it is executed once the query is complete. Is there a better way to form the schema?

In Trips table the schema is as such

 tripTable:
  Type: "AWS::DynamoDB::Table"
  Properties:
    AttributeDefinitions:
      [
        { "AttributeName": "id", "AttributeType": "S" },
        { "AttributeName": "date", "AttributeType": "S" },
        { "AttributeName": "Asset", "AttributeType": "S" },
        { "AttributeName": "Route", "AttributeType": "S" },
      ]

    KeySchema:
      [
        { "AttributeName": "date", "KeyType": "HASH" },
        { "AttributeName": "id", "KeyType": "RANGE" },
      ]
    ProvisionedThroughput:
      ReadCapacityUnits: 5
      WriteCapacityUnits: 5
    StreamSpecification:
      StreamViewType: "NEW_AND_OLD_IMAGES"
    TableName: ${self:provider.environment.TRIPS}
    GlobalSecondaryIndexes:
      - IndexName: TripsVSAssets
        KeySchema:
          - AttributeName: asset
            KeyType: HASH
          - AttributeName: date
            KeyType: RANGE
        Projection:
          ProjectionType: ALL
        ProvisionedThroughput:
          ReadCapacityUnits: "5"
          WriteCapacityUnits: "5"
        GlobalSecondaryIndexes:
      - IndexName: RoutesVSAssets
        KeySchema:
          - AttributeName: route
            KeyType: HASH
          - AttributeName: date
            KeyType: RANGE
        Projection:
          ProjectionType: ALL
        ProvisionedThroughput:
          ReadCapacityUnits: "5"
          WriteCapacityUnits: "5"
Divye Shah
  • 747
  • 1
  • 11
  • 24

2 Answers2

1

I had a similar problem recently and opted for using year as partition key and date as sort key. This was right for my amount of data and let me query by date and mostly just run a single query. If you have a lot of data, maybe month or even week would be more suitable (or something else entirely).

With my approach I then just needed to check for whether the date range I want to look at cuts across two years and in those cases (i.e. very rarely) the Lambda makes two queries and combines the results. I've included some draft code below in case it's useful (there are probably better ways, but this worked for me!) and I also recommend this quick read: https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/.

module.exports.getLatest = async event => {

  // some date and formatting code here not included

  var params1 = {
    ExpressionAttributeNames: { "#date": "date", "#year": "year" },
    ExpressionAttributeValues: {
      ':d': isoDate,
      ':y1': y1
     },
   KeyConditionExpression: '#year = :y1 AND #date > :d',
   TableName: process.env.HEADLINES_TABLE
  }

  if (y1 != y2) {
   // define var params2 (the same as params1 except it uses y2)
  }

  try {
    let result;

    // if the date range cuts across partitions (years), fire off two queries and wait for both

    if(y1 != y2) {
        let resultPromise1 = client.query(params1).promise();
        let resultPromise2 = client.query(params2).promise();
        const [result1, result2] = await Promise.all([resultPromise1,resultPromise2]);
        result = [...result1.Items, ...result2.Items];
    } else { 
        result = await client.query(params1).promise();
    }

    return {
      // stringify and return result.Items, statuscode 200 etc.
    }
  }
  // catch {} code here (irrelevant for the answer)
}
miksimal
  • 26
  • 3
0

You need one more index column, where the partition key (Hash type) would be a random number , lets say from 0 to 20. And the sort key (range type), put date again there.

So for querying all the trips between certain time, you will need to query 20 times in parallel with partition Key as each number between 0 to 20 and Sort Key as time range.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html

Check the above guide, go to table at the end of the page and check 5th entry

enter image description here

If you do a scan here than dynamoDB will charge you for each partition key. In the above mentioned parallel query technique, you will be charged only for N queries (20 in above case).

Sumeet P
  • 144
  • 1
  • 5
  • Making 20 scans in parellel, won't that take a tong time when the database grows? – Divye Shah Apr 24 '20 at 12:46
  • @Summet I tried your approach but one of the objects in the response is not in order. Do I need to do something after promise.all gets resolved for all 20 promises? – Divye Shah Apr 24 '20 at 18:45
  • @DivyeShah As database grows, it won't take a long time when running 20 query (its not scan), because that's what DynamoDB offers. But still if your query response data size increases to 400kb, you may notice difference then Yes, you need to merge the response of all the 20 responses and sort them at application level. – Sumeet P Apr 26 '20 at 06:41