6

Good afternoon all, I'm using a java.lang.StringBuilder to store some characters. I have no idea how many characters I'm going to store in advance, except that:

  1. 60% of the time, it is only (exactly) 7 characters
  2. 39% of the time, it is (roughly) 3500 characters
  3. 1% of the time, it is roughly 20k characters

How do we go about calculating the optimal initial buffer length that should be used?

Currently I'm using new java.lang.StringBuilder(4000) but that's just because I was too lazy to think previously.

mavis
  • 3,100
  • 3
  • 24
  • 32
Pacerier
  • 86,231
  • 106
  • 366
  • 634
  • It sounds like default could be optimal most of the time. Can you re-cycle your StringBuilders? – Peter Lawrey Jan 21 '12 at 13:46
  • 1
    "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" – Natix Jan 21 '12 at 13:51
  • @PeterLawrey No they aren't recyclable (I mean they *could* be, and I believe that would be much more optimal however it would require a change to source code). – Pacerier Jan 21 '12 at 13:53
  • @natix Fear of premature optimization is part of the reason I inserted 4000 and be-done-with-it in the first place. However now that it is *post-mature*, it isn't too hard to simply change the argument of a constructor is it? – Pacerier Jan 21 '12 at 13:55
  • Any change would require a change to source code. You can make the objects reuseable without an object pool style recycling. – Peter Lawrey Jan 21 '12 at 13:55
  • @PeterLawrey Lol you got me =) I actually meant "it would require a change to the logic of the already-written class if I make it reuseable". I was actually thinking of just getting a bit of optimization by tweaking a single number. – Pacerier Jan 21 '12 at 13:59
  • I would stick with the initial capacity. If the performance problems become observable, only then I would use a profiler and **measure** the performance with various capacity settings. Blind guessing doesn't help here, moreover you're probably needlessly allocating quite amounts of memory. – Natix Jan 21 '12 at 14:07
  • Similar Question, [Most efficient initial capacity size for StringBuilder?](http://stackoverflow.com/q/13360229/642706). – Basil Bourque Jul 17 '15 at 05:33

1 Answers1

12

There are two factors here: time and memory consumption. The time is mostly influenced by the number of times java.lang.AbstractStringBuilder.expandCapacity() is called. Of course the cost of each call is linear to the current size of the buffer, but I am simplifying here and just counting them:

Number of expandCapacity() (time)

Default configuration (16 characters capacity)

  • In 60% of the cases the StringBuilder will expand 0 times
  • In 39% of the cases the StringBuilder will expand 8 times
  • In 1% of the cases the StringBuilder will expand 11 times

The expected number of expandCapacity is 3,23.

Initial capacity of 4096 characters

  • In 99% of the cases the StringBuilder will expand 0 times
  • In 1% of the cases the StringBuilder will expand 3 times

The expected number of expandCapacity is 0,03.

As you can see the second scenario seems much faster, as it very rarely has to expand the StringBuilder (three time per every 100 inputs). Note however that first expands are less significant (copying small amount of memory); also if you add strings to the builder in huge chunks, it will expand more eagerly in less iterations.

On the other hand the memory consumption grows:

Memory consumption

Default configuration (16 characters capacity)

  • In 60% of the cases the StringBuilder will occupy 16 characters
  • In 39% of the cases the StringBuilder will occupy 4K characters
  • In 1% of the cases the StringBuilder will occupy 32K characters

The expected average memory consumption is: 1935 characters.

Initial capacity of 4096 characters

  • In 99% of the cases the StringBuilder will occupy 4K characters
  • In 1% of the cases the StringBuilder will occupy 32K characters

The expected average memory consumption is: 4383 characters.


TL;DR

This makes me believe that enlarging the initial buffer to 4K will increase the memory consumption by more than two times while speeding up the program by two orders of magnitude.

The bottom line is: try! It is not that hard to write a benchmark that will process million strings of various length with different initial capacity. But I believe a bigger buffer might be a good choice.

Tomasz Nurkiewicz
  • 334,321
  • 69
  • 703
  • 674