15

On this blog post, it's said that the minimum memory usage of a String is:

8 * (int) ((((no chars) * 2) + 45) / 8) bytes.

So for the String "Apple Computers", the minimum memory usage would be 72 bytes.
Even if I have 10,000 String objects of twice that length, the memory usage would be less than 2Mb, which isn't much at all. So does that mean I'm underestimating the amount of Strings present in an enterprise application, or is that formula wrong?

Thanks

rtheunissen
  • 7,347
  • 5
  • 34
  • 65

3 Answers3

19

String storage in Java depends on how the string was obtained. The backing char array can be shared between multiple instances. If that isn't the case, you have the usual object overhead plus storage for one pointer and three ints which usually comes out to 16 bytes overhead. Then the backing array requires 2 bytes per char since chars are UTF-16 code units.

For "Apple Computers" where the backing array is not shared, the minimum cost is going to be

  1. backing array for 16 chars -- 32B which aligns nicely on a word boundary.
  2. pointer to array - 4 or 8B depending on the platform
  3. three ints for the offset, length, and memoized hashcode - 12B
  4. 2 x object overhead - depends on the VM, but 8B is a good rule of thumb.
  5. one int for the array length.

So roughly 72B of which the actual payload constitutes 44.4%. The payload constitutes more for longer strings.


In Java7, some JDK implementations are doing away with backing array sharing to avoid pinning large char[]s in memory. That allows them to do away with 2 of the three ints.

That changes the calculation to 64B for a string of length 16 of which the actual payload constitutes 50%.

Community
  • 1
  • 1
Mike Samuel
  • 118,113
  • 30
  • 216
  • 245
3

Is it possible to save character data using less memory than a Java String? Yes.

Does it matter for "enterprise" applications (or even Android or J2ME applications, which have to get by on a lot less memory)? Almost never.

Premature optimization is the root...

Thilo
  • 257,207
  • 101
  • 511
  • 656
1

Compared to a other data types that you have, it is definitely high. The other primitives use 32 bits,64 bits,etc.

And given that String is immutable, every time you perform any operation on it, you end up creating a new String object, consuming even more memory.

Kazekage Gaara
  • 14,972
  • 14
  • 61
  • 108
  • 1
    Because String is immutable, the operations you perform on it can actually *save* space, because the Strings can share memory. – Thilo Jun 21 '12 at 03:23
  • But every time you create a new `String` object, wouldn't it occupy more memory? – Kazekage Gaara Jun 21 '12 at 03:24
  • Primitives use 32 bytes? I think you meant bits. :) – Makoto Jun 21 '12 at 03:25
  • 1
    @Makoto oops. Silly Silly mistake. Corrected. Thanks. – Kazekage Gaara Jun 21 '12 at 03:26
  • "Apple Computers".substring(3) will not occupy a lot of extra memory (just the overhead for the object instance). – Thilo Jun 21 '12 at 03:26
  • @Thilo compared that to any other primitive, a little extra memory is still a little extra, isn't it? – Kazekage Gaara Jun 21 '12 at 03:32
  • why are you comparing a 16 character sequence to a primitive? The most compact way you could store it in is a 16 byte array. – Thilo Jun 21 '12 at 03:34
  • but even that memory consumption is still memory consumption, isn't it? I might be wrong, still on the learning curve. :-) – Kazekage Gaara Jun 21 '12 at 03:40
  • 1
    @Kazekage occupying more memory when creating new String depends on the way it [creates](http://docs.oracle.com/javase/specs/jls/se5.0/html/lexical.html#3.10.5). In case of constant expression actual string values are shared in global pool among references you have. Otherwise new String object always creates. You can force sharing and reusing string object by calling intern() method of String class. – Viktor Stolbin Jun 21 '12 at 04:50