How to implement the validation of the size of a DynamoDB partition key in Java

Question

I have a Java-based backend app which issues a getItem request to my DynamoDB table based on a request from my user. Occasionally my user sends a request to my app which ends up sending a getItem request to DynamoDB which hits the maximum size limit of the key below (quote from https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html ):

For a simple primary key, the maximum length of the first attribute value (the partition key) is 2048 bytes.

I get this error from the DynamoDB SDK when that happens: One or more parameter values were invalid: Size of hashkey has exceeded the maximum size limit of 2048 bytes

Now I need to implement a validation for this situation where my user sends a request which hits the limit to avoid the error. My question here is: what is the right way to implement this validation in my app? Judging by the documentation linked above, DynamoDB seems to be using UTF-8 internally, so will something like below be fine?

boolean isPartitionKeySizeValid(String partitionKey) {
    int size = partitionKey.getBytes(StandardCharsets.UTF_8).length;
    return 1 <= size && size <= 2048;
}

My app uses the com.amazonaws:aws-java-sdk-dynamodb library to interact with DynamoDB.

Why are your clients generating such large partition keys? Perhaps you should address that issue. Secondly, why not simply catch this DynamoDB error when it happens and report a relevant error back to the client? — jarmod, Apr 07 '23 at 12:29
The key can come from public internet. I cannot control what a malicious user does. Also it’s not so easy to tell what exactly the problem was from the error from DynamoDB. I don’t want to parse the error message string for example. — Kohei Nozaki, Apr 07 '23 at 13:33

score 1 · Accepted Answer · answered Apr 07 '23 at 09:42

1

Yes simply counting the byte length will allow you to avoid hitting the 2KB partition key value limit.

answered Apr 07 '23 at 09:42

Leeroy Hannigan

11,409
3
14
31

Alex Chadyuk · Answer 2 · 2023-04-07T11:07:06.473

1

If your table does not use a sort key, the 2048 bytes should fit in the length of the partition-key name, in addition to the UTF-8-encoded value. For a composite key, the length is limited to 1024.

edited Apr 07 '23 at 11:07

answered Apr 07 '23 at 09:49

Alex Chadyuk

1,421
2
10

1

IMO that kind of change "for performance sake" is unlikely to have any major effect, especially given that `getBytes(StandardCharsets.UTF_8)` will be an orders of magnitudes more expensive operation! **If** performance is an issue in this method (and that's a pretty big IF), then optimizing to avoid that call if possible would be better. For example if the string length * 4 (or 6, if feeling especially pessimistic) is less than 2048, then the UTF-8 encoded form is guaranteed to fit. Or even [calculate the length without doing the encoding](https://stackoverflow.com/questions/8511490). – Joachim Sauer Apr 07 '23 at 10:20
@JoachimSauer, I agree with every statement you make, however, given the amount of information provided in the question, the maximum entropy demands that size > 2048 and size < 1 are equally likely. Then, given the fact that the author does not evaluate for size < 1 early, but goes straight for the expensive operation of UTF conversion, the equal likelihood conclusion is tipped towards size > 2048 being more likely. Given the information at hand, evaluating this condition first provides efficiency gain. However, since this is not related to original question, I'll remove this. – Alex Chadyuk Apr 07 '23 at 10:55

How to implement the validation of the size of a DynamoDB partition key in Java

2 Answers2