0

Let's say we have a my_string = "123456"

I do

my_string.getBytes()

and

new BigInteger(123456).toByteArray()

The resulting byte arrays are different for both these cases. Why is that so? Isn't "123456" same as 123456 other than the difference in data type?

Ashwin
  • 12,691
  • 31
  • 118
  • 190
  • 1
    As an `int`: 4 bytes, by definition. As a `String`: `log10(x)+1`, because the radix is different and the length is variable. – user207421 Apr 12 '19 at 03:30

3 Answers3

3

They are different because the String type is made up of unicode characters. The character '2' is not at all the same as the numeric value 2.

Josh
  • 500
  • 3
  • 6
  • Upvoted for saying "Unicode" instead of "ASCII". It's not the 1980s any more :-) –  Apr 14 '19 at 13:18
2

No. Why would they be? "123456" is a sequence of the ASCII character 1 (which is not represented as the number 1, but as the number 49), followed by the number 2 (50), and so on. 123456 as an int isn't even represented as a sequence of digits from 0-9, but it's stored as a number in binary.

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
1

I assume that you are asking about the total memory used to represent a number as a String versus a byte[].

The String size will depend on the actual string representation used. This depends on the JVM version; see What is the Java's internal represention for String? Modified UTF-8? UTF-16?

For Java 8 and earlier (with some caveats), the String consists of a String object with 1 int fields and 1 reference field. Assuming 64 bit references, that adds up to 8 bytes of header + 1 x 4 bytes + 1 x 8 bytes + 4 bytes of padding. Then add the char[] used to represent the characters: 12 bytes of header + 2 bytes per character. This needs to be rounded up to a multiple of 8.

For Java 9 and later, the main object has the same size. (There is an extra field ... but that fits into the "padding".) The char[] is replaced by a byte[], and since you are just storing ASCII decimal digits1, they will be encoded one character per byte.

In short, the asymptotic space usage is 1 byte per decimal digit for Java 9 or later and 2 bytes per decimal digit in Java 8 or earlier.

For the byte[] representation produce from a BigInteger, the represention consists of 12 bytes of header + 1 byte per byte ... rounded up to a multiple of 8. The asymptotic size is 1 byte per byte.

In both cases there is also the size of the reference to the representation; i.e. another 8 bytes.

If you do the sums, the byte[] representation is more compact than the String representation in all cases. But int or long are significantly more compact that either of these representations in all cases.


1 - If you are not ... or if you are curious why I added this caveat ... read the Q&A at the link above!

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Just to clarify: the optimization for Java 9+ is called "Compact Strings" and applies to strings which contain only ASCII characters. If you have the string "123č" then each char is represented as two bytes. – Juraj Martinka Apr 13 '19 at 05:49
  • ... which does not apply to this question. – Stephen C Apr 13 '19 at 11:10
  • ...but can help to clarify the actual behavior and avoid drawing wrong conclusions in a different context. – Juraj Martinka Apr 13 '19 at 19:09
  • This is already addressed: *"... and if you are just storing decimal digits, they will be encoded one character per byte."* This question is about something very specific ... whether it saves space to store big numbers as strings versus byte arrays. I disagree that my answer needs an edit to deal with Java 9+ string optimization. Feel free to write your own answer. – Stephen C Apr 14 '19 at 00:00