0

I have the below declaration in my code:

String[] array1 = new String[];

if array1 has 1.000.000 elements (all strings with 80 characters) how heavy is it? I mean for the RAM memory.

Sam DeHaan
  • 10,246
  • 2
  • 40
  • 48
Marco Micheli
  • 707
  • 3
  • 8
  • 26
  • 16
    About seven ounces per hundred elements? – Kevin Apr 05 '12 at 12:59
  • 1
    Count the number of characters in the array and multiply the sum into the size of `char` which is 2 bytes. – Eng.Fouad Apr 05 '12 at 13:01
  • Undefined by the JLS, for example @Eng.Fuad's solution would be wrong by about a factor of 2 on some Hotspot implementations with the right options and the right data :) – Voo Apr 05 '12 at 13:03
  • 1
    Related: http://stackoverflow.com/questions/10009856/memory-size-of-a-string-array-storing-binary-codes – assylias Apr 05 '12 at 13:05
  • Why would you want to keep all 1 million in memory? Also, some chars in a string are represented by more than one char, i.e. it depends on the encoding. – TechTrip Apr 05 '12 at 13:06
  • 1
    possible duplicate of [Is there any sizeof-like method in Java?](http://stackoverflow.com/questions/2370288/is-there-any-sizeof-like-method-in-java) – Brian Roach Apr 05 '12 at 13:19
  • Not a real answer to your question, but an interesting article about Java memory allocation (including arrays) can be found [here](http://www.ibm.com/developerworks/java/library/j-codetoheap/index.html?ca=drs-) – Robin Apr 05 '12 at 13:30
  • The simplest thing to do is measure it for your JVM. ;) – Peter Lawrey Apr 05 '12 at 13:39

2 Answers2

7

The answer is that it depends on many factors:

  • the JVM you are using; i.e. the provider and the version
  • whether you are using a 32 bit or 64 bit JVM.
  • whether or not you are using "compressed oops" (on a 64 bit HotSpot JVM: -XX:+UseCompressedOops).
  • whether you are using UTF-8 strings (some HotSpot JVMs support this: -XX:+UseCompressedStrings)
  • whether the elements of the String array are null or not,
  • whether the elements of the String array are the same reference,
  • whether the Strings are interned, and whether the interning is effective,
  • whether the Strings share the same backing array,
  • and so on.

Dynamically created Strings are not interned by default. If you intern them, you may save space, if there are many "equal" Strings in your dataset. But if the flip side that the string pool has storage overheads (it is a big hash table) so if the ratio of equal to non-equal Strings is too small then you waste space rather than saving it.

The point about backing arrays is complicated too. The background is that the split methods (for example) create String objects that share the original String'scharacter array. If you create lots of substrings of the same original string this can save space. But the flipside is that if you create a small substring of a large string, the small substring can cause the original String's entire backing array to remain reachable.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • +1 good answer. BTW, what do you mean by `whether or not you are using OOPs (on a 64 bit JVM).`? – Eng.Fouad Apr 05 '12 at 13:28
  • I think he means whether you are using compressed oops i.e. 32-bit references on a 64-bit JVM. This is default on later versions of Java when the heap is less than 32 GB. – Peter Lawrey Apr 05 '12 at 13:39
  • That's what I meant. (Updated with the JVM option names) – Stephen C Apr 05 '12 at 13:45
  • Thanks stephen at last one sensible answer that contains only factual information! – Voo Apr 05 '12 at 15:56
4

It's implementation-dependent. Assuming a typical JVM which uses UTF-16 encoding internally, it might be something like this.

1 million elements * 80 characters * 2 bytes = 160 million bytes for the text data.

Add on some overhead for each String's internal data structures (say 16 bytes or so), a reference to each String (say 8 bytes), and a little bit for the array itself (say 12 bytes) and you have:

184,000,012 bytes

Graham Borland
  • 60,055
  • 21
  • 138
  • 179
  • 1
    But that's just the data, without the bookkeeping and all other internal data structures that the VM keeps about the array. – Blagovest Buyukliev Apr 05 '12 at 13:02
  • You forgot the overhead of the string object itself. I think that is around 24 bytes per string. So you should add 24 millions bytes to the ram usage. – MTilsted Apr 05 '12 at 13:02
  • And you forgot that nothing forces the JVM to store the data internally as UTF-16 **and in fact there are implementations that don't**. That's the problem with all this undefined things.. – Voo Apr 05 '12 at 13:04
  • There are not really any large internal data structures for an array. So the total overhead for your array is around 32 bytes total. So small that you don't even need to think about it when you are already using more then 160MB ram :} – MTilsted Apr 05 '12 at 13:08
  • You forget that String are non-mutable, If there are duplicates in that array, only one instance of that string is taking up room, not XXX number of duplicates. – Churk Apr 05 '12 at 13:13
  • @MTilsted Actually the overhead per string is about 52byte on modern Hotspot implementations (assuming non shared char arrays), whatever that knowledge's good for :) – Voo Apr 05 '12 at 13:13
  • 1
    @Churk only if the strings are interned. If they're 80 characters long, you can be pretty sure they're **not** interned. – Graham Borland Apr 05 '12 at 13:15
  • And one million references to string objects stored in the array. – josefx Apr 05 '12 at 13:16
  • @Graham Interning has not much to do with size. String constants have to be interned and dynamic strings aren't by any JVM I'm aware of if you don't explicitly ask for it. Also no we can share char arrays between strings just fine even if they aren't interned.. – Voo Apr 05 '12 at 13:17
  • @GrahamBorland, at 80 characters it is still interned. Test it: ` String a = "nchaskdhfgnchaskdhfgnchaskdhfgnchaskdhfgnchaskdhfgnchaskdhfgnchaskdhfgnchaskdhfg"; String b = "nchaskdhfgnchaskdhfgnchaskdhfgnchaskdhfgnchaskdhfgnchaskdhfgnchaskdhfgnchaskdhfg"; System.out.println("String a.equals(b) : " + (a.equals(b) ? "Yes" : "No")); System.out.println("String a == b : " + (a == b ? "Yes" : "No"));` – Churk Apr 05 '12 at 13:27
  • @Churk that only applies to String literals by default. I believe GrahamBorland assumes that the one million strings are created at runtime, how often do you deal with one million 80 char Strings in your source code? – josefx Apr 05 '12 at 13:37