2

I have a Java-based backend app which issues a getItem request to my DynamoDB table based on a request from my user. Occasionally my user sends a request to my app which ends up sending a getItem request to DynamoDB which hits the maximum size limit of the key below (quote from https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html ):

For a simple primary key, the maximum length of the first attribute value (the partition key) is 2048 bytes.

I get this error from the DynamoDB SDK when that happens: One or more parameter values were invalid: Size of hashkey has exceeded the maximum size limit of 2048 bytes

Now I need to implement a validation for this situation where my user sends a request which hits the limit to avoid the error. My question here is: what is the right way to implement this validation in my app? Judging by the documentation linked above, DynamoDB seems to be using UTF-8 internally, so will something like below be fine?

boolean isPartitionKeySizeValid(String partitionKey) {
    int size = partitionKey.getBytes(StandardCharsets.UTF_8).length;
    return 1 <= size && size <= 2048;
}

My app uses the com.amazonaws:aws-java-sdk-dynamodb library to interact with DynamoDB.

Kohei Nozaki
  • 1,154
  • 1
  • 13
  • 36
  • Why are your clients generating such large partition keys? Perhaps you should address that issue. Secondly, why not simply catch this DynamoDB error when it happens and report a relevant error back to the client? – jarmod Apr 07 '23 at 12:29
  • The key can come from public internet. I cannot control what a malicious user does. Also it’s not so easy to tell what exactly the problem was from the error from DynamoDB. I don’t want to parse the error message string for example. – Kohei Nozaki Apr 07 '23 at 13:33

2 Answers2

1

Yes simply counting the byte length will allow you to avoid hitting the 2KB partition key value limit.

Leeroy Hannigan
  • 11,409
  • 3
  • 14
  • 31
1

If your table does not use a sort key, the 2048 bytes should fit in the length of the partition-key name, in addition to the UTF-8-encoded value. For a composite key, the length is limited to 1024.

Alex Chadyuk
  • 1,421
  • 2
  • 10
  • 1
    IMO that kind of change "for performance sake" is unlikely to have any major effect, especially given that `getBytes(StandardCharsets.UTF_8)` will be an orders of magnitudes more expensive operation! **If** performance is an issue in this method (and that's a pretty big IF), then optimizing to avoid that call if possible would be better. For example if the string length * 4 (or 6, if feeling especially pessimistic) is less than 2048, then the UTF-8 encoded form is guaranteed to fit. Or even [calculate the length without doing the encoding](https://stackoverflow.com/questions/8511490). – Joachim Sauer Apr 07 '23 at 10:20
  • @JoachimSauer, I agree with every statement you make, however, given the amount of information provided in the question, the maximum entropy demands that size > 2048 and size < 1 are equally likely. Then, given the fact that the author does not evaluate for size < 1 early, but goes straight for the expensive operation of UTF conversion, the equal likelihood conclusion is tipped towards size > 2048 being more likely. Given the information at hand, evaluating this condition first provides efficiency gain. However, since this is not related to original question, I'll remove this. – Alex Chadyuk Apr 07 '23 at 10:55