13

According to DynamoDB doc: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html

"You should maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table."

But according to my experience you always have to do the opposite thing due to partition key design.

Let's consider the next situation. We have several user roles, for example, "admin", "manager", "worker". Usual workflow of an admin is to CRUD manager data, where read operation is to get not one manager but all manager list. The same is for the manager - he CRUDs worker data. We have only two scenarios of key usage for both cases:

  • get a list of all items (item key doesn't matter)
  • work with a particular item using its full key.

Naturally we should have uniformly distributed partition key (as the doc emphasises) so we can't select user role for it and should use user id. Since we already have as partition key some random id, we don't need sort key at all since it simply doesn't work - we already access exectly one user by only using the partition key part. At this point we realize that user id is working like a charm for CUD operations but for every R operation we need to scan all the table and then filter the result by user role which is ineffective. How can this be improved? Very naturally - let's just have own table for each user type! Then we will scan for manager list from admin API and for worker list from the manager one.

I use DynamoDB almost for a year and still can't get it. For me the reality is that for real life scenarios sort key is something that you can never use (the only real case for it I had was to access items like "agreements" that belong to the two users of different types the same time, so the primary key was { partion: "managerId", sort: "userId" } and secondary global index was { partition: "userId", sort: "managerId" } so I could effectively query for all particualar manager agreement list or all particular user agreement list providing only corresponding manger or user id for the query. The approach is discussed in doc here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-adjacency-graphs.html).

I feel that I don't understand the concept at all. What can be an effective way of key schema for provided example to use only one DynamoDB table for both user types?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Arsenii Fomin
  • 3,120
  • 3
  • 22
  • 42
  • 2
    I find the statement *"You should maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table."* to be extremely suspect. That sounds like an extreme over-generalization of NoSQL to me. I would **NOT** try to make that a goal of your application design. Use DynamoDB however it works best for your application, given the type of queries that you will need to perform. – Mark B Sep 11 '18 at 15:18
  • @MarkB I've found this article showing how do they suggest to use one table with a number technics, but I will really need a lot of time to understand what they are doing: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html – Arsenii Fomin Sep 11 '18 at 15:42
  • 1
    I second Mark B’s comment. Take that with a huge grain of salt. I think it’s a gross overgeneralization and the reality in the fief is far from it. In many cases it becomes a really bad idea to store everything in one table – Mike Dinescu Sep 11 '18 at 23:48
  • I suggest the statement from AWS that the majority of NOSQL storage designs should have one table is complete and utter non-sense. In answer to your question, you would use a graph-node schema for a single table (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-adjacency-graphs.html). However, adopting this design would impact your application code greatly and would likely lead to poor alignment between your storage and business logic code. – F_SO_K Sep 12 '18 at 13:16
  • @Stu, what AWS doc suggests in all that designs is something like "our tables have some very low level access keys built in, you should not use them directly but create additional layer of logic and code based on them with convinient keys to access your data" I would even adopted such style if have worked with AWS only via API. But in reality I do the opposite thing - never use API and use web UI where you can view all the tables manually. And having only one table with such a mess of keys, data knowing that I will have to look into it regulary is completely unacceptable for me. – Arsenii Fomin Sep 13 '18 at 06:43
  • While the DynamoDB documentation suggests limiting databases to one table, some of the sample projects they ship with certain products, like Amplify and AppSync, that use DynamoDB—which are tiny projects—come with multiple tables. Lolz – trndjc Jan 17 '19 at 04:11
  • 2
    this talk - https://www.youtube.com/watch?v=HaEPXoXVf2k from re:invent 2018 might help understand the single table design philosophy. – Deepak Rao Apr 23 '19 at 11:16

2 Answers2

1

It sounds like what you need in this case is a Global Secondary Index (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html) where the partition key is the user role. That way, you can query all users with a particular role through that UserRoleIndex and, with the help of a sort key on the user ID, single out one particular user within that role.

Alternatively, if you are starting from scratch with a new table, you might not even need an index (unless you don't know the role of a user when you delete them). You can use a "composite primary key" (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.PrimaryKey) where the partition key and the sort key would be the same as in the index I am suggesting above.

Using the same notation that you used in your question, I would recommend { partition: "userRole", sort: "userId" }.

DynamoDB can be hard to understand sometimes and there definitively are cases where a traditional SQL database makes more sense. This video from AWS re:Invent 2018 is great to understand the difference between the two: https://www.youtube.com/watch?v=HaEPXoXVf2k&feature=youtu.be.

In your case, though, it looks like you have a very clear access pattern, so DDB would work for you.

Yves Gurcan
  • 1,096
  • 1
  • 10
  • 25
  • Since it's an old question, it's rather outdated for me. After some additional investigation, I've found that for many usage cases (when you have like hundreds-thousands of few KB items and never expect millions of them), you will never get out of a single partition. The last automatically means that there is no sense in considering uniform distribution of a partition key. Moreover, to have possibility to get items by some range, I would use the same constant value for all keys in the system (completely ignore partition key) like { partition: 1, sort: timestamp } – Arsenii Fomin Dec 14 '19 at 11:39
0

you can have a schema like

user_id, role, <other columns>

where

  • user_id = hash-key
  • role = GSI hash-key

This way, you can read and get all managers' list by querying the GSI

With GSI, DynamoDb creates another table and maintains it ,so you don't need to maintain multiple tables.

let me know if you have any questions

dDarkLORD
  • 624
  • 7
  • 25
  • Could you Please have a look at https://stackoverflow.com/questions/57522977/dynamodb-query-all-records-as-per-no-sql-design and answer it ? – Bhargava Aug 19 '19 at 07:09