How can I design a DynamoDB table with a GSI when I need to know the unique keys?

Question

I'm trying to design a DynamoDB table. I'm choose the keys I insert, what my partition and sort keys will be, as well as whether or not to including a GSI.

My data, for sake of example, looks like this:

Table name: person

{
    uuid: "649ed782-0d65-402d-968e-56f3b36a98c3",
    name: "Mike",
    state: "Massachusetts",
    city: "Boston",
    bio: "Mike makes DynamoDB tables",
    age: 41,
}

I will need to perform the following operations on the table.

Get the entire record of data about the person looked up by the UUID
Get the list of names of people according to the city and state
Get a list of city/state pairs that make up the data in the table

It seems to me like it's an obvious choice to have the table have a partition key of UUID which gives me a fast get lookup for operation #1 and I could create a GSI with both city and state as the primary key, or a concatenation of them, which allows me to satisfy #2. Operation #3 is a bit tricky. I want to just ask the GSI what its keys are but I have read that this is not possible on other stack overflow posts.

So I'm wondering if the best option is to create a secondary table with a stream and lambda that responds to updates on the main person table and just has one item in the entire table which is a list of city and state combinations. Maybe it would be good to create some sort of other database like a Reddis or Memcached instance that stores this stuff? I think that would potentially be needlessly expensive. Is there a standard way to design a table like this? Would it be expensive to do a full scan of my GSI looking just for partition keys? I think it is. At least it was pretty time consuming when I did a test.

Thanks for the input.

fedonev · Answer 1 · 2022-06-16T10:21:29.530

1

Here are 3 options to model the state-city combinations in DynamoDB:

Each uses a Single table design with overloaded generic primary key fields PK and SK.
Each stores the state-city data as table items. No secondary indexes used.
As you say, Streams + Lambda keep the items current as user records change.
Which variant is "better" depends on the volume of data and read/write velocity
Each variant has low partition key cardinality, which leads to the "hot partition" problem at scale. The write sharding pattern fixes this.

PK                           SK               statecity          state_bird          uuid   
USER#649ed782-0d65-402d...   USER                                                    649ed...

# Variant 1: One record with a Map of States (keys) and City String Sets (values)
STATECITY                    STATECITY        {AL: ["Birmingham", "Mobile"]}

# Variant 2: State and City concatenated in SK
STATECITY                    AL#Birmingham                       Northern Flicker
STATECITY                    AL#Mobile                           Northern Flicker

# Variant 3: State PK, City SK
STATE#AL                     Birmingham                          Northern Flicker
STATE#AL                     Mobile                              Northern Flicker

Variant #1 stuffs all the data into one item. Easist for low volume data with low write velocity.

Variant #2 has a item per city-state combination. All items have the same PK, which makes getting all the records easy. Query for all cities in a state with a begins_with(SK, 'AL#') condition. Optionally add aribtrary denormalized state or city attributes like state_bird to the items.

Variant #3 is similar to #2, but has state-specific PKs. BatchGetItem with 50 parallel queries gets all states at once. Query for all cities in a state with a simple PK= condition. Higher key cardinality than #2, but otherwise #2 is easier to deal with unless you need more granularity within a city (e.g. concatenate a postal code to the SK like Mobile#36525).

edited Jun 16 '22 at 10:21

answered Jun 16 '22 at 10:16

fedonev

20,327
2
25
34

Thank you. I believe one of these will work just fine. Putting this all into one table is a common design pattern (I'm now realizing) of DynamoDB I'm just not familiar with yet. – mmachenry Jun 20 '22 at 20:46
This seems to embed the data for state the states and cities that exist into a different PK/SK space than the users. Which I understand. I also see how it makes the #3 query I need to write very quick in each variant. Also the #1 is obvious since you still have USER#UUID as a primary. But how does query #2 in my needs get written here? It doesn't seem like you can look up users by a state/city pairing without a scan. Would it be best to have a primary key that is AL#Birmingham and a sort key that is USER#UUID for this? – mmachenry Jun 20 '22 at 20:59
@mmachenry Yes, exactly. As you say, for #2 you could add an index called `GSI1` with the `AL#Birmingham` pattern as the `GSI1PK` and `USER#UUID` as the `GSI1SK`. – fedonev Jun 21 '22 at 11:01
Wouldn't it make more sense to have PK=AL#Birmingham and SK=USER:UUID for any user records? I think this would have hot partitions when a city has a lot of users, which isn't great, but the GSI you're talking about you also have the same issue and this would do it in one table. – mmachenry Jun 22 '22 at 00:04
@mmachenry Only if you don't need #1. – fedonev Jun 22 '22 at 07:55
I'm a bit confused on how you suggest to maintain the STATECITY items. I think it's fairly obvious that when you insert a person into the the table, you would also insert a STATECITY item with their state and city as the sort key (if you're using variant #2) but then what about when a person record is updated or removed? Will the system then need to do query for people from that state and city to make sure it's not zero and if so remove that state/city combo in a transaction? – mmachenry Jun 22 '22 at 19:36
@mmachenry "Streams + Lambda" (answer's 3rd bullet point) would be my choice. – fedonev Jun 23 '22 at 11:51

How can I design a DynamoDB table with a GSI when I need to know the unique keys?

1 Answers1