3

We are new to DynamoDB and struggling with what seems like it would be a simple task.

It is not actually related to stocks (it's about recording machine results over time) but the stock example is the simplest I can think of that illustrates the goal and problems we're facing.

The two query scenarios are:

  • All historical values of given stock symbol <= We think we have this figured out
  • The latest value of all stock symbols <= We do not have a good solution here!

Assume that updates are not synchronized, e.g. the moment of the last update record for TSLA maybe different than for AMZN.

The 3 attributes are just { Symbol, Moment, Value }. We could make the hash_key Symbol, range_key Moment, and believe we could achieve the first query easily/efficiently.

We also assume could get the latest value for a single, specified Symbol following https://stackoverflow.com/a/12008398

The SQL solution for getting the latest value for each Symbol would look a lot like https://stackoverflow.com/a/6841644

But... we can't come up with anything efficient for DynamoDB.

Is it possible to do this without either retrieving everything or making multiple round trips?

The best idea we have so far is to somehow use update triggers or streams to track the latest record per Symbol and essentially keep that cached. That could be in a separate table or the same table with extra info like a column IsLatestForMachineKey (effectively a bool). With every insert, you'd grab the one where IsLatestForMachineKey=1, compare the Moment and if the insertion is newer, set the new one to 1 and the older one to 0.

This is starting to feel complicated enough that I question whether we're taking the right approach at all, or maybe DynamoDB itself is a bad fit for this, even though the use case seems so simple and common.

Penny
  • 347
  • 4
  • 12

1 Answers1

0

There is a way that is fairly straightforward, in my opinion.

Rather than using a GSI, just use two tables with (almost) the exact same schema. The hash key of both should be symbol. They should both have moment and value. Pick one of the tables to be stocks-current and the other to be stocks-historical. stocks-current has no range key. stocks-historical uses moment as a range key.

Whenever you write an item, write it to both tables. If you need strong consistency between the two tables, use the TransactWriteItems api.

If your data might arrive out of order, you can add a ConditionExpression to prevent newer data in stocks-current from being overwritten by out of order data.

The read operations are pretty straightforward, but I’ll state them anyway. To get the latest value for everything, scan the stocks-current table. To get historical data for a stock, query the stocks-historical table with no range key condition.

Matthew Pope
  • 7,212
  • 1
  • 28
  • 49
  • Thanks; we did end up writing to two tables representing historical and current values. It reminded me of https://martinfowler.com/eaaDev/EventSourcing.html even if it's not exactly the same. Our requirements here were super-basic in all their particulars, so we were ok with having the write-to-two-tables operation in Lambda after picking them up from SQS, but it didn't feel right for something more robust. I should have updated here but wasn't convinced it was a great solution. Anyway, I'll take a look at TransactWriteItems. Thanks again. – Penny Jan 31 '19 at 22:58