5

I'm writing lots of stuff to log in bursts, and optimizing the data path. I build the log text with StringBuilder. What would be the most efficient initial capacity, memory management wise, so it would work well regardless of JVM? Goal is to avoid reallocation almost always, which should be covered by initial capacity of around 80-100. But I also want to waste as few bytes as possible, since the StringBuilder instance may hang around in buffer and wasted bytes crop up.

I realize this depends on JVM, but there should be some value, which would waste least bytes, no matter the JVM, sort of "least common denominator". I am currently using 128-16, where the 128 is a nice round number, and subtraction is for allocation overhead. Also, this might be considered a case of "premature optimization", but since the answer I am after is a "rule-of-a-thumb" number, knowing it would be useful in future too.

I'm not expecting "my best guess" answers (my own answer above is already that), I hope someone has researched this already and can share a knowledge-based answer.

Alexandre Lavoie
  • 8,711
  • 3
  • 31
  • 72
hyde
  • 60,639
  • 21
  • 115
  • 176
  • The answert to this question depends on a lot of things, for example how long the text is that you store in a `StringBuilder` etc. The only way to find out is measure using a memory and/or CPU profiler. There's no reason to worry about a few bytes unless you are creating hundreds of thousands of `StringBuilder` objects. – Jesper Nov 13 '12 at 11:58
  • 1
    By far the biggest overhead is the cost of IO. Unless you don't intend to write this data to IO, I wouldn't worry about it. – Peter Lawrey Nov 13 '12 at 12:00

2 Answers2

4

Don't try to be smart in this case.

I am currently using 128-16, where the 128 is a nice round number, and subtraction is for allocation overhead.

In Java, this is based on totally arbitrary assumptions about the inner workings of a JVM. Java is not C. Byte-alignment and the like are absolutely not an issue the programmer can or should try to exploit.

If you know the (probable) maximum length of your strings you may use that for the initial size. Apart from that, any optimization attempts are simply in vain.

If you really know that vast amounts of your StringBuilders will be around for very long periods (which does not quite fit the concept of logging), and you really feel the need to try to persuade the JVM to save some bytes of heap space you may try and use trimToSize() after the string is built completely. But, again, as long as your strings don't waste megabytes each you really should go and focus on other problems in your application.

JimmyB
  • 12,101
  • 2
  • 28
  • 44
3

Well, I ended up testing this briefly myself, and then testing some more after comments, to get this edited answer.

Using JDK 1.7.0_07 and test app reporting VM name "Java HotSpot(TM) 64-Bit Server VM", granularity of StringBuilder memory usage is 4 chars, increasing at even 4 chars.

Answer: any multiple of 4 is equally good capacity for StringBuilder from memory allocation point of view, at least on this 64-bit JVM.

Tested by creating 1000000 StringBuilder objects with different initial capacities, in different test program executions (to have same initial heap state), and printing out ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getUsed() before and after.

Printing out heap sizes also confirmed, that amount actually allocated from heap for each StringBuilder's buffer is an even multiple of 8 bytes, as expected since Java char is 2 bytes long. In other words, allocating 1000000 instances with initial capacity 1..4 takes about 8 megabytes less memory (8 bytes per instace), than allocating same number of isntances with initial capacity 5...8.

hyde
  • 60,639
  • 21
  • 115
  • 176
  • Do you mind sharing your testing procedures? - How do you manage to determine the heap usage with such a granularity? – JimmyB Nov 13 '12 at 20:49
  • I don't have the code handy, but the heap usage went a step up every increase of 4 units in StringBuilder initial capacity, then was about same for 3 next sizes, before jumping up at next multiple of 4 again. **But** that's 4 chars, meaning 8 bytes, right? Thanks for asking, I'll definitely test again tomorrow to verify this. – hyde Nov 13 '12 at 21:07
  • So you observed an increase in heap usage in steps of 1000000 x 4 bytes? -- I don't dare to think of estimating how many bytes of Java heap space a [data structure] will occupy, not for a `char` and not for any other value/type in any Java program. - Besides, irrespective of the *allocation* granularity of the heap, the granularity at which the GC decides to *release* the memory back to the heap is unknown and will influence any measurement. - If you are doing your testing out of curiosity and/or to measure some characteristics of a given JVM, go ahead. - Otherwise, ... see my answer above :) – JimmyB Nov 13 '12 at 21:33
  • I observed that 1000000 x `new StringBuilder(112)` took about same amount of heap as 1000000 x `new StringBuilder(115)`. Increasing capacity to 116 increased heap use noticeably, 120 increased it again etc. I was rather surpised to think it was 4 bytes, but 4 chars = 8 bytes makes much more sense (on 64 bit JVM). – hyde Nov 13 '12 at 21:37