What is the difference in bytes of a number as a string and as an integer?

Question

Let's say we have a my_string = "123456"

I do

my_string.getBytes()

and

new BigInteger(123456).toByteArray()

The resulting byte arrays are different for both these cases. Why is that so? Isn't "123456" same as 123456 other than the difference in data type?

As an `int`: 4 bytes, by definition. As a `String`: `log10(x)+1`, because the radix is different and the length is variable. — user207421, Apr 12 '19 at 03:30

score 3 · Answer 1 · answered Apr 12 '19 at 03:23

3

They are different because the String type is made up of unicode characters. The character '2' is not at all the same as the numeric value 2.

answered Apr 12 '19 at 03:23

Josh

500
3
6

Upvoted for saying "Unicode" instead of "ASCII". It's not the 1980s any more :-) – Apr 14 '19 at 13:18

score 2 · Accepted Answer · answered Apr 12 '19 at 03:54

No. Why would they be? "123456" is a sequence of the ASCII character 1 (which is not represented as the number 1, but as the number 49), followed by the number 2 (50), and so on. 123456 as an int isn't even represented as a sequence of digits from 0-9, but it's stored as a number in binary.

Stephen C · Answer 3 · 2019-04-14T08:03:29.943

I assume that you are asking about the total memory used to represent a number as a String versus a byte[].

The String size will depend on the actual string representation used. This depends on the JVM version; see What is the Java's internal represention for String? Modified UTF-8? UTF-16?

For Java 8 and earlier (with some caveats), the String consists of a String object with 1 int fields and 1 reference field. Assuming 64 bit references, that adds up to 8 bytes of header + 1 x 4 bytes + 1 x 8 bytes + 4 bytes of padding. Then add the char[] used to represent the characters: 12 bytes of header + 2 bytes per character. This needs to be rounded up to a multiple of 8.

For Java 9 and later, the main object has the same size. (There is an extra field ... but that fits into the "padding".) The char[] is replaced by a byte[], and since you are just storing ASCII decimal digits¹, they will be encoded one character per byte.

In short, the asymptotic space usage is 1 byte per decimal digit for Java 9 or later and 2 bytes per decimal digit in Java 8 or earlier.

For the byte[] representation produce from a BigInteger, the represention consists of 12 bytes of header + 1 byte per byte ... rounded up to a multiple of 8. The asymptotic size is 1 byte per byte.

In both cases there is also the size of the reference to the representation; i.e. another 8 bytes.

If you do the sums, the byte[] representation is more compact than the String representation in all cases. But int or long are significantly more compact that either of these representations in all cases.

^{1 - If you are not ... or if you are curious why I added this caveat ... read the Q&A at the link above!}

Just to clarify: the optimization for Java 9+ is called "Compact Strings" and applies to strings which contain only ASCII characters. If you have the string "123č" then each char is represented as two bytes. — Juraj Martinka, Apr 13 '19 at 05:49
...but can help to clarify the actual behavior and avoid drawing wrong conclusions in a different context. — Juraj Martinka, Apr 13 '19 at 19:09
This is already addressed: *"... and if you are just storing decimal digits, they will be encoded one character per byte."* This question is about something very specific ... whether it saves space to store big numbers as strings versus byte arrays. I disagree that my answer needs an edit to deal with Java 9+ string optimization. Feel free to write your own answer. — Stephen C, Apr 14 '19 at 00:00

What is the difference in bytes of a number as a string and as an integer?

3 Answers3