1

First, I'm new to DyanmoDB and would like to use it for a little side project to get familiar with it. I've started reading some online documentation and blogs. However, some of the stuff isn't fully clear to me yet.

Use case: I would like to store only two (for the moment), time series in DynamoDB. One will be an inflation rate and one will be an interest rate. I will have a lambda function which fetches these data from the web on an ongoing basis and stores the new value of each to the DynamoDB. My application is then always using the latest available value of this rates. The frequency they are sourced is rather slow, not more often than on a weekly basis.

For this I wanted to use the following schema (serverless.yml)

 resources: # CloudFormation template syntax from here on.
      Resources:
        usersTable: 
          Type: AWS::DynamoDB::Table
          Properties:
            TableName: RatesTable
            AttributeDefinitions:
              - AttributeName: RateType
                AttributeType: S
              - AttributeName: SourcedOn
                AttributeType: Date
              - AttributeName: Tenor
                AttributeType: S
              - AttributeName: Rate
                AttributeType: N
             KeySchema:
               - AttributeName: RateType
                 KeyType: HASH
               - AttributeName: SourcedOn
                 KeyType: RANGE

Is this a reasonable schema / setup to choose. What I'm not so sure is what to use for the SourcedOn attribute. Is Date type the right, or should I use a integer in the format of 20190101?. This last question is also in hinsight of using python to query and I'm always and only interested in reading the latest sourced value per RateType

Is there a way to retrieve (without knowing which date exactly) the last sourced rate with boto? How would such a query look like?

math
  • 1,868
  • 4
  • 26
  • 60

1 Answers1

1

This is a reasonable schema.

I’d recommend using a ISO 8601 date or timestamp string for the SourcedOn attribute (E.g. 2019-10-22 or 2019-10-22T12:35:00Z) because it’s a standard format, and because if you ever decide to change the frequency to hourly, for example, all you need to do is switch from dates to timestamps, and it won’t mess up any of your queries. Strings are also advantageous for dates because it’s really easy to use a begins_with query to find all of the values for a given year or month. Finally, there are already libraries that can handle converting from a datetime to a string, so you don’t have to write code to serialize the date to an integer or string.

To retrieve the latest date, your query’s KeyConditionExpression should only have the hash key value. In your query request, you should also set ScanIndexForward=false and Limit=1.

Matthew Pope
  • 7,212
  • 1
  • 28
  • 49