Support for queries across documents in DynamoDB

Question

I have been evaluating migration of our datastore from MongoDB to DynamoDB, since it is a well established AWS service.

However, I am not sure if the DynamoDB data model is robust enough to support our use cases. I understand that DynamoDB added document support in 2014, but whatever examples I have seen, does not look to be addressing queries which work across documents, and which do not specify a value for the partition key.

For instance if I have a document containing employee info, { "name": "John Doe", "department": "sales", "date_of_joining": "2017-01-21" }

and I need to make query like give me all the employees which have joined after 01-01-2016, then I can't make it with this schema. I might be able to make this query after creating a secondary index which has a randomly generated partition key (say 0-99) and create a sort key on "date_of_joining", then query for all the partitions and put condition on "date_of_joining". But this is too complex a way to do a simple query, doing something like this in MongoDB is quite straightforward.

Can someone help with understanding if there is a better way to do such queries in DynamoDB and is DynamoDB really suited for such use cases?

something similar http://stackoverflow.com/a/34961036/2811189 — Harshal Bulsara, Feb 03 '17 at 11:12

score 1 · Accepted Answer · answered Feb 03 '17 at 14:20

Actually, the partition key of the GSI need not be unique. You can have date_of_joining as a partition key of GSI.

However, when you query the partition key, you cannot use greater than for the partition key field. Only equality is supported for partition key. I am not sure that why you wanted to have a random number as partition key of GSI and date_of_joining as sort key. Even if you design like, I don't thing you will be able to use DynamoDB Query API to get the expected result. You may end-up using DynamoDB Scan API which is a costly operation in DynamoDB.

GSI:

date_of_joining - as Partition key

Supported in Query API:-

If you have multiple items for the same DOJ, the result with have multiple items (i.e. when you query using GSI).

KeyConditionExpression : 'date_of_joining = :doj'

Not supported in Query API:-

KeyConditionExpression : 'date_of_joining > :doj'

Conclusion:-

You need to use DynamoDB Scan. If you are going to use Scan, then GSI may not be required. You can directly scan the main table using FilterExpression.

FilterExpression : 'date_of_joining > :doj'

Disadvantage:-

Costly
Not efficient

Thanks for your reply. The reason I am adding a column having numbers from 0-99 is that since querying requires the value of the partition key, I would be able to provide the values (0-99). Would need to fire 100 queries but the processing would get well distributed across the cluster. And when I make 'date_of_joining' as sort key, I will be able to make greater than and less than queries on top of it. Please let me know if that is not case. — Ashish, Feb 06 '17 at 03:38
Scan is not a good option because it consumes lost of read capacity which directly related to cost. It is not efficient because it has to read all items in the table and then apply the filter criteria to produce the result. Think about the performance and cost if the table has 100K items. — notionquest, Feb 06 '17 at 09:23

score 0 · Answer 2 · answered Feb 04 '17 at 23:47

0

You might decide to support your range queries with an indexing backend. For example, you could stream your table updates in DynamoDB to AWS ElasticSearch with a Lambda function, and then query ES for records matching the range of join dates you choose.

answered Feb 04 '17 at 23:47

Alexander Patrikalakis

5,054
1
30
48

Thanks Alexander. Right now we are already so lost with so many databases, that we don't want to add two, for one workload. – Ashish Feb 06 '17 at 16:26

Support for queries across documents in DynamoDB

2 Answers2