15

I'm trying to decide whether to use binary, number, or string for my DynamoDB table's partition key. My application is a React.js/Node.js social event-management application where as much as half of the data volume stored in DynamoDB will be used to store relationships between Items and Attributes to other Items and Attributes. For example: friends of a user, attendees at an event, etc.

Because the schema is so key-heavy, and because the maximum DynamoDB Item size is only 400KB, and for perf & cost reasons, I'm concerned about keys taking up too much space. That said, I want to use UUIDs for partition keys. There are well-known reasons to prefer UUIDs (or something with similar levels of entropy and minimal chance of collisions) for distributed, serverless apps where multiple nodes are giving out new keys.

So, I think my choices are:

  1. Use a hex-encoded UUID (32 bytes stored after dashes are removed)
  2. Encode the UUID using base64 (22 bytes)
  3. Encode the UUID using z85 (20 bytes)
  4. Use a binary-typed attribute for the key (16 bytes)
  5. Use a number-typed attribute for the key (16-18 bytes?) - the Number type can only accommodate 127 bits, so I'd have to perform some tricks like stripping a version bit, but for my app that's probably OK. See How many bits of integer data can be stored in a DynamoDB attribute of type Number? for more info.

Obviously there's a tradeoff in developer experience. Using a hex string is the clearest but also the largest. Encoded strings are smaller but harder to deal with in logs, while debugging, etc. Binary and Number are harder than strings, but are the smallest.

I'm sure I'm not the first person to think about these tradeoffs. Is there a well-known best practice or heuristic to determine how UUID keys should be stored in DynamoDB?

If not, then I'm leaning towards using the Binary type, because it's the smallest storage and because its native representation (as a base64-encoded string) can be used everywhere humans need to view and reason about keys, including queries, logging, and client code. Other than having to transform it to/from a Buffer if I use DocumentClient, am I missing some problem with the Binary type or advantage of one of the other options in the list above?

If it matters, I'm planning for all access to DynamoDB to happen via a Lambda API, so even if there's conversion or marshalling required, that's OK because I can do it inside my API.

BTW, this question is a sequel to a 4-year-old question (UUID data type in DynamoDB) but 4 years is a looooooong time in a fast-evolving space, so I figured it was worth asking again.

Justin Grant
  • 44,807
  • 15
  • 124
  • 208
  • 3
    When using the official DynamoDB SDKs to auto-generate UUIDs they produce 16 byte hex strings. I don't have any comments on whether that's good or bad. – F_SO_K Oct 30 '18 at 09:14
  • @F_SO_K - do you mean 32-byte hex strings? UUIDs are 16 bytes binary, so would be 32 bytes when converted to hex strings because each binary byte is 2 hex digits. If you mean 16-byte hex strings, then please explain which half of the UUID is being encoded in hex. – Justin Grant Nov 14 '19 at 19:23
  • DynamoDBMapper (the official Java SDK) uses Java UUID which is 16 bytes https://docs.oracle.com/javase/6/docs/api/java/util/UUID.html. I don't have an opinion here, other than an observation that it was good enough for the AWS SDK developers. – F_SO_K Nov 14 '19 at 20:53
  • Here is the reference from DynamoDBMapper btw https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.Annotations.html#DynamoDBMapper.Annotations.DynamoDBAutoGeneratedKey – F_SO_K Nov 14 '19 at 20:54

1 Answers1

2

I had a similar issue and concluded that the size of the key did not matter too much as all my options were going to be small and lightweight, with only minor tradeoffs. I decided that a programmer friendly way i.e. me would be to use the 'sub' that is the number created by cognito for each unique user. That way all the collision issues should they arise would also be taken care of by cognito. I could then encode or not encode. So howseover a user logs in, they will end up with the 'sub' then I match that with the records in the hash key of dynamodb and that immediately grants them fine-grained access to only their data. Three years later, I have found that to be a very reliable method.

David White
  • 621
  • 1
  • 10
  • 23