0

I have been evaluating migration of our datastore from MongoDB to DynamoDB, since it is a well established AWS service.

However, I am not sure if the DynamoDB data model is robust enough to support our use cases. I understand that DynamoDB added document support in 2014, but whatever examples I have seen, does not look to be addressing queries which work across documents, and which do not specify a value for the partition key.

For instance if I have a document containing employee info, { "name": "John Doe", "department": "sales", "date_of_joining": "2017-01-21" }

and I need to make query like give me all the employees which have joined after 01-01-2016, then I can't make it with this schema. I might be able to make this query after creating a secondary index which has a randomly generated partition key (say 0-99) and create a sort key on "date_of_joining", then query for all the partitions and put condition on "date_of_joining". But this is too complex a way to do a simple query, doing something like this in MongoDB is quite straightforward.

Can someone help with understanding if there is a better way to do such queries in DynamoDB and is DynamoDB really suited for such use cases?

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Ashish
  • 3,028
  • 5
  • 28
  • 35

2 Answers2

1

Actually, the partition key of the GSI need not be unique. You can have date_of_joining as a partition key of GSI.

However, when you query the partition key, you cannot use greater than for the partition key field. Only equality is supported for partition key. I am not sure that why you wanted to have a random number as partition key of GSI and date_of_joining as sort key. Even if you design like, I don't thing you will be able to use DynamoDB Query API to get the expected result. You may end-up using DynamoDB Scan API which is a costly operation in DynamoDB.

GSI:

date_of_joining - as Partition key

Supported in Query API:-

If you have multiple items for the same DOJ, the result with have multiple items (i.e. when you query using GSI).

KeyConditionExpression : 'date_of_joining = :doj'

Not supported in Query API:-

KeyConditionExpression : 'date_of_joining > :doj'

Conclusion:-

You need to use DynamoDB Scan. If you are going to use Scan, then GSI may not be required. You can directly scan the main table using FilterExpression.

FilterExpression : 'date_of_joining > :doj'

Disadvantage:-

  • Costly

  • Not efficient

notionquest
  • 37,595
  • 6
  • 111
  • 105
  • Thanks for your reply. The reason I am adding a column having numbers from 0-99 is that since querying requires the value of the partition key, I would be able to provide the values (0-99). Would need to fire 100 queries but the processing would get well distributed across the cluster. And when I make 'date_of_joining' as sort key, I will be able to make greater than and less than queries on top of it. Please let me know if that is not case. – Ashish Feb 06 '17 at 03:38
  • Scan is not a good option because it consumes lost of read capacity which directly related to cost. It is not efficient because it has to read all items in the table and then apply the filter criteria to produce the result. Think about the performance and cost if the table has 100K items. – notionquest Feb 06 '17 at 09:23
0

You might decide to support your range queries with an indexing backend. For example, you could stream your table updates in DynamoDB to AWS ElasticSearch with a Lambda function, and then query ES for records matching the range of join dates you choose.

Alexander Patrikalakis
  • 5,054
  • 1
  • 30
  • 48
  • Thanks Alexander. Right now we are already so lost with so many databases, that we don't want to add two, for one workload. – Ashish Feb 06 '17 at 16:26