14

If I take an XML file that is around 2kB on disk and load the contents as a String into memory in Java and then measure the object size it's around 33kB.

Why the huge increase in size?
If I do the same thing in C++ the resulting string object in memory is much closer to the 2kB.

To measure the memory in Java I'm using Instrumentation. For C++, I take the length of the serialized object (e.g string).

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
imrichardcole
  • 4,633
  • 3
  • 23
  • 45
  • 11
    How are you measuring the in memory size? – Ren May 24 '13 at 06:52
  • How are you storing it in memory in java. Also Java has an overhead of around 16 bytes per object, so if you have lots of small string objects you will have a very high overhead !!! – Bruce Martin May 24 '13 at 06:59
  • I expect and overhead, but not ~30kB – imrichardcole May 24 '13 at 07:04
  • 3
    @imrichardcole can you please post your java/c++ code that you used to measure memory size. Anyone can't answer this question without knowing whether you are measuring them correctly first and foremost – Krishnabhadra May 24 '13 at 07:06
  • `Runtime.getRuntime().totalMemory());` will tell you how much memory is being used, but do you need to force garbage collection before calling it? – Bull May 24 '13 at 07:07
  • 4
    Can you describe the way you come up with 33KB? I believe the size you found is probably not the size for the string itself. – Adrian Shum May 24 '13 at 07:16
  • Yes, you should GC and give it time to finish. You also better to create a million of string copies in array, measure array size, fill it with strings, to be sure that you measure the size of strings and not other service objects, which may present in your program. String alone cannot take 32 kb. But hierarcy of XML objects can. – Val May 24 '13 at 07:52

6 Answers6

4

I think there are multiple factors involved. First of all, as Bruce Martin said, objects in java have an overhead of 16 bytes per object, c++ does not. Second, Strings in Java might be 2 Bytes per character instead of 1. Third, it could be that Java reserves more Memory for its Strings than the C++ std::string does.

Please note that these are just ideas where the big difference might come from.

Marius
  • 2,234
  • 16
  • 18
  • I believe we all aware of these overhead. However, it should be around double of the string length (or x3/x4 if there are lots of characters requires surrogates). However it cannot explain the >15x difference. There is something else wrong – Adrian Shum May 24 '13 at 07:30
  • I believe that these overheads amplify themselves if the java implementation uses many single string objects to store its data. – Marius May 24 '13 at 07:59
4

Assuming that your XML file contains mainly ASCII characters and uses an encoding that represents them as single bytes, then you can espect the in memory size to be at least double, since Java uses UTF-16 internally (I've heard of some JVMs that try to optimize this, thouhg). Added to that will be overhead for 2 objects (the String instance and an internal char array) with some fields, IIRC about 40 bytes overall.

So your "object size" of 33kb is definitely not correct, unless you're using a weird JVM. There must be some problem with the method you use to measure it.

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
2

In Java String object have some extra data, that increases it's size.
It is object data, array data and some other variables. This can be array reference, offset, length etc.

Visit http://www.javamex.com/tutorials/memory/string_memory_usage.shtml for details.

Chechulin
  • 2,426
  • 7
  • 28
  • 35
  • However such extra data will not cost almost 30KB for a 2KB (in ASCII)/4KB (in UTF-16) string – Adrian Shum May 24 '13 at 07:16
  • Adrian, you are right. It is a mistake to say so. You can easily have huge data structures, which store 0 useful data. – Val May 24 '13 at 07:21
  • java.lang.String contains reference to the array, offset, length and hash code as integers, and 2 more references. It will yeld 24 bytes for x86 jvm and 36b for x64. Also, there are some memory overhead for char array. – Chechulin May 24 '13 at 07:24
  • @Chechulin we are all aware of that. However that overhead is simply tens of bytes. Even we included the overhead for ASCII vs UTF16, that only double the size, which means it should cost around 4KB. 33KB is doubtless in a level that cannot be explained by such kind of overhead. – Adrian Shum May 24 '13 at 07:29
  • Adrian, are you talking about java or c++? I think there is a little misunderstanding between us. `java.lang.String` should have at least 12b of memory overhead, 24b to 36b of variables and references and an overhead for the char array. Plus 2b - 4b for chars. It should take over 40b in memory. **upd**: if we discard array memory usage from String size in memory and replace it with reference to array, it will give exactly 40b (for String object) in memory. – Chechulin May 24 '13 at 07:35
  • External references http://karussell.wordpress.com/2012/04/18/memory-efficient-java-mission-impossible/ say about `Using 4 * 4 + 7 * 4 = 44 (!) bytes for an empty 32-bit String object`. It can be as 2x larger for 64-bit strings. The overhead is not constant also. Do not forget that there is an alignment also to 8 (or 16-byte?) words. Which 36b 64-bit overhead are you talking about? – Val May 24 '13 at 07:42
  • 1
    @Chechulin I was talking about Java :) All the "normal" overhead we are talking is going to cost ~40bytes. If we treat the difference between ASCII vs UTF16 as "overhead", it mostly doubles the size. However, OP is asking for a ~2000-char-string costing 33KB in memory, which seems cannot be explained by those normal overhead of Java String – Adrian Shum May 24 '13 at 07:43
  • Damn it, I misread this! I was thinking about 2B, not 2KB. Sorry, I was wrong. – Chechulin May 24 '13 at 07:51
1

String: a String's memory growth tracks its internal char array's growth. However, the String class adds another 24 bytes of overhead. For a nonempty String of size 10 characters or less, the added overhead cost relative to useful payload (2 bytes for each char plus 4 bytes for the length), ranges from 100 to 400 percent.

More: What is the memory consumption of an object in Java?

Community
  • 1
  • 1
bengro
  • 1,004
  • 10
  • 19
0

Yes, you should GC and give it time to finish. Just System.gc(); and print totalMem() in the loop. You also better to create a million of string copies in array (measure empty array size and, then, filled with strings), to be sure that you measure the size of strings and not other service objects, which may present in your program. String alone cannot take 32 kb. But hierarcy of XML objects can.

Said that, I cannot resist the irony that nobody cares about memory (and cache hits) in the world of Java. We are know that JIT is improving and it can outperform the native C++ code in some cases. So, there is not need to bother about memory optimization. Preliminary optimization is a root of all evils.

Val
  • 1
  • 8
  • 40
  • 64
  • JIT may be a valid point. Just curious, in what case may JIT think creating a big array can be beneficial to the application? – Adrian Shum May 24 '13 at 07:32
  • Who told you about the big array? I know that JVM prefers a lot of small objects, every consuming a lot of memory. – Val May 24 '13 at 07:33
  • arrr... I was just thinking the only piece JIT may "optimize" on the String object to make it grow so big is the char array in that. Maybe I should ask, " Just curious, in what case may JIT think allocating a much bigger piece of memory for a small object can be beneficial to the application? " – Adrian Shum May 24 '13 at 07:39
  • I do not know but hackers say that `a bare Object takes up 8 bytes; an instance of a class with a single boolean field takes up 16 bytes: 8 bytes of header, 1 byte for the boolean and 7 bytes of "padding" to make the size up to a multiple of 8;` The things should be worse for 64-bit jvm. – Val May 24 '13 at 07:47
0

As stated in other answers, Java's String is adding an overhead. If you need to store a large number of strings in memory, I suggest you to store them as byte[] instead. Doing so the size in memory should be the same than the size on disk.

String -> byte[] :

String a = "hello";
byte[] aBytes = a.getBytes();

byte[] -> String :

String b = new String(aBytes);
Tim Autin
  • 6,043
  • 5
  • 46
  • 76