I want to determine the data size of a JSON Java String in Bytes using Java. This calculation should be platform-independant as the software is used on different systems and (possible) different default character encodings (Windows, Linux, zOS, ...). The JSON is supposed to only contain character that are possible to be encoded using UTF-8. By now, in all use cases, there are only characters that can be encoded by 1 byte, however, in future, chinese characters, like e.g. (U+20F2E), are used, too.
Is there a best practice way of calulating the data size in a robust kind of way here?
From what I understand, json.getBytes("UTF-8").length
seems to be a valid solution.
Test outputs on windows:
This is a 1Byte UTF-8 character:
@
"@".length() -> 1
"@".getBytes().length -> 1
"@".getBytes("UTF-8").length -> 1
new String("@".getBytes("UTF-8")) -> @
"@".getBytes("UTF-16").length -> 4
new String("@".getBytes("UTF-16")) -> ��
This is a 2Byte UTF-8 character:
µ
"µ".length() -> 1
"µ".getBytes().length -> 2
"µ".getBytes("UTF-8").length -> 2
new String("µ".getBytes("UTF-8")) -> µ
"µ".getBytes("UTF-16").length -> 4
new String("µ".getBytes("UTF-16")) -> ��
This is a 4Byte UTF-8 Character:
"".length() -> 2
"".getBytes().length -> 4
"".getBytes("UTF-8").length -> 4
new String("".getBytes("UTF-8")) ->
"".getBytes("UTF-16").length -> 6
new String("".getBytes("UTF-16")) -> ���c��
EDIT: The length of the "compressed" JSON should be caluculated, i.e. without any unnecessary whitespaces (from pretty print).