I'm trying to decide whether to use binary, number, or string for my DynamoDB table's partition key. My application is a React.js/Node.js social event-management application where as much as half of the data volume stored in DynamoDB will be used to store relationships between Items and Attributes to other Items and Attributes. For example: friends of a user, attendees at an event, etc.
Because the schema is so key-heavy, and because the maximum DynamoDB Item size is only 400KB, and for perf & cost reasons, I'm concerned about keys taking up too much space. That said, I want to use UUIDs for partition keys. There are well-known reasons to prefer UUIDs (or something with similar levels of entropy and minimal chance of collisions) for distributed, serverless apps where multiple nodes are giving out new keys.
So, I think my choices are:
- Use a hex-encoded UUID (32 bytes stored after dashes are removed)
- Encode the UUID using base64 (22 bytes)
- Encode the UUID using z85 (20 bytes)
- Use a binary-typed attribute for the key (16 bytes)
- Use a number-typed attribute for the key (16-18 bytes?) - the Number type can only accommodate 127 bits, so I'd have to perform some tricks like stripping a version bit, but for my app that's probably OK. See How many bits of integer data can be stored in a DynamoDB attribute of type Number? for more info.
Obviously there's a tradeoff in developer experience. Using a hex string is the clearest but also the largest. Encoded strings are smaller but harder to deal with in logs, while debugging, etc. Binary and Number are harder than strings, but are the smallest.
I'm sure I'm not the first person to think about these tradeoffs. Is there a well-known best practice or heuristic to determine how UUID keys should be stored in DynamoDB?
If not, then I'm leaning towards using the Binary type, because it's the smallest storage and because its native representation (as a base64-encoded string) can be used everywhere humans need to view and reason about keys, including queries, logging, and client code. Other than having to transform it to/from a Buffer
if I use DocumentClient
, am I missing some problem with the Binary type or advantage of one of the other options in the list above?
If it matters, I'm planning for all access to DynamoDB to happen via a Lambda API, so even if there's conversion or marshalling required, that's OK because I can do it inside my API.
BTW, this question is a sequel to a 4-year-old question (UUID data type in DynamoDB) but 4 years is a looooooong time in a fast-evolving space, so I figured it was worth asking again.