26

I have searched about this, but I couldn't find why StringBuilder's ensureCapacity() method won't lengthen the old capacity by just doubling but also adding two.

So, when default capacity of 16 is full, next lengthened value will be 34 unless whole string length doesn't exceed 34. Why shouldn't it be 32?

My best guess is considering for a null character, '\u0000', but I'm not sure. Can anyone tell me why?

mx0
  • 6,445
  • 12
  • 49
  • 54
Philip Paek
  • 271
  • 2
  • 6
  • @soorapadman `int newCapacity = (value.length + 1) * 2;` Won't it be sufficient with just `(value.length) * 2;` ? That's what I'm curious about. – Philip Paek Jul 14 '17 at 05:05
  • i understood . I am expecting someone explain better rather than i am giving it answer . This is the reason i kept quit . :) – soorapadman Jul 14 '17 at 05:07
  • Thank you @soorapadman for helping me out! :D I appreciate it. – Philip Paek Jul 14 '17 at 05:09
  • where did you find `(value.length + 1) * 2`. I mean is it used as it is in somewhere ? – prime Jul 14 '17 at 06:55
  • @prime You can check [here](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/AbstractStringBuilder.java#AbstractStringBuilder.ensureCapacity%28int%29), though it is the open-jdk version. – Philip Paek Jul 14 '17 at 07:43
  • @PhilipPaek it's bit different now I think. https://searchcode.com/codesearch/view/17990403/ here it is `int newCapacity = value.length * 2 + 2;` – prime Jul 14 '17 at 08:45
  • what does the source code say? that will tell you what you need to know. –  Jul 17 '17 at 19:26
  • 1
    Though the documentation states; *Twice the old capacity, plus 2*, the code `int newCapacity = (value.length + 1) * 2` actually meant; *Add an additional space, and double it down.* So that makes sense, since the capacity is filled up, just add **ONE** additional space, and then double the new length. – Den Isahac Jul 17 '17 at 20:53
  • For sure the null character '\u0000` has nothing to do with it, it has **no special meaning** in Java. Java **does handle** the null character exactly the same way as any other character. Java handles the length of a String *not* by the means of the null character. – Honza Zidek Jul 24 '17 at 13:26

5 Answers5

8

I believe it has to do with a simple, if somewhat dumb, way to ensure the corner case of very small strings.

For example, if I have the string

""

and I double it only, I will not have a sufficient size to store anything else in it. If I double it and add a small constant number of spaces, I can assure that my new value is larger than my old one.

Why increment it by two then? Probably a small performance improvement. By adding two instead of 1, I can avoid an intermediate expansion for small expansions (0 to 10 chars detailed below)

"" => expand => "1" => expand => "123" expand => "1234567" expand => "123456789012345"

which is 4 expands compared to

"" => expand => "12" => expand => "123456" => expand => "123456789012"

which is 3 expands. This also works nicely for one char strings (expanding to 10 chars)

"1" => expand => "1234" => expand => "1234567890"

while the 1 char expansion routine looks like

"1" => expand => "123" => expand => "1234567" => expand => "123456789012345"

Finally, an added increment of two tends to word align about 50% of the time, while added increments of one or three would do so about 25% of the time. While this might not seem like a big deal, some architectures cannot accommodate non-aligned reads without expensive interrupt calls to rewrite the read in the CPU, leading to all sorts of performance issues.

Edwin Buck
  • 69,361
  • 7
  • 100
  • 138
  • Why not add space only when length = 0? – shmosel Jul 17 '17 at 20:38
  • They include an if-else that handles the 0 case even if they didn't add 2. – matt Jul 17 '17 at 21:03
  • @matt Well, that's bizarre, but it wouldn't be the first time I've seen multiple ways of handling the same "worrisome" case. – Edwin Buck Jul 17 '17 at 21:05
  • *That's the same as adding one to the length*. I'm not talking about how much to add, just *when* to add. – shmosel Jul 17 '17 at 21:07
  • @shmosel The comment in the code block indicates that reading the correct amount to add is problematic. "This implements the expansion semantics of ensureCapacity with no size check or synchronization" I imagine that reading the size of something not yet stored might be problematic, or that locking the item to be read might create an issue. In other words, they expand the capacity, without attempting to know the right capacity due to "design reasons". All expansions will be "a guess" and if the guess is too small, they'll try again. – Edwin Buck Jul 17 '17 at 21:11
  • 1
    @shmosel The really bizarre thing about this guessing technique, is that they are given the minimumCapacity to expand to. I'm guessing that they added the `minimumCapacity` variable after they wrote the `expandCapacity(...)` method as an attempt to not stack trash as much, but the original design was "expand to what's needed, when needed" and the odd (double + 2) semantics made more sense when it was the only line in the function. – Edwin Buck Jul 17 '17 at 21:14
  • In mono they use a similar technique for StringBuilder, but they only doubles the capacity on the first try. – matt Jul 17 '17 at 21:16
0

I think the reason is a combination of

  • some ancient ;-) heuristic strategy how to extend the capacity, especially for short buffers,

  • documenting this strategy in earliest java API docs,

  • Sun/Oracle being very careful to stick to once-documented behaviour.

StringBuilder shares this method with its predecessor StringBuffer, which reads (probably from the earliest beginnings, at least in j2sdk1.4_02, which happened to still exist in some archive folder on my machine):

/**
 * Ensures that the capacity of the buffer is at least equal to the
 * specified minimum.
 * If the current capacity of this string buffer is less than the 
 * argument, then a new internal buffer is allocated with greater 
 * capacity. The new capacity is the larger of: 
 * <ul>
 * <li>The <code>minimumCapacity</code> argument. 
 * <li>Twice the old capacity, plus <code>2</code>. 
 * </ul>
 * If the <code>minimumCapacity</code> argument is nonpositive, this
 * method takes no action and simply returns.
 *
 * @param   minimumCapacity   the minimum desired capacity.
 */
public synchronized void ensureCapacity(int minimumCapacity) {
    if (minimumCapacity > value.length) {
        expandCapacity(minimumCapacity);
    }
}

And it documents exactly the times-two plus-two behaviour, so even if in the meantime some JRE developer had found a better strategy, there's no chance to implement it here because it wouldn't conform to the documentation.

Ralf Kleberhoff
  • 6,990
  • 1
  • 13
  • 7
  • *Sun/Oracle being very careful to stick to once-documented behaviour.* If it can't be observed, I wouldn't call it documented behavior. It's just an implementation note. – shmosel Jul 24 '17 at 19:22
  • It can be observed through the capacity() method. – Daniel Lemire Aug 10 '17 at 19:07
0

I suppose that object alignment is a key, because length * 2 + 2-strategy is memory-effective (see explanation below).

Let's consider HotSpot JVM.

First of all, java objects are 8-bytes aligned and char array is not an exception.

Secondly, sizeof(object header) equals 8 bytes on 32-bit JVM and 16 bytes on 64-bit JVM with -XX:-UseCompressedOops.

Thus, object body should be aligned by 8 bytes:
objectBodySize(charArray) == sizeOf(arrayLength) + sizeOf(arrayValues) == (4 bytes) + (arrayLength * 2 bytes).

If old array length is even, then new array length will always give zero-size alignment.

Examples:

  1. oldCharArrayLength == 6 then newCharArrayLength == 14 and objectBodySize(newCharArray) == 4 + 14 * 2 == 32

  2. oldCharArrayLength == 4 then newCharArrayLength == 10 and objectBodySize(newCharArray) == 4 + 10 * 2 == 24

It's important to note that -XX:+UseCompressedOops flag is available since 1.6 whereas StringBuilder and AbstractStringBuilder are available since 1.5. It means that the strategy above with two additional chars has zero-cost of memory on 64-bit JVM before 1.6, whereas sizeof(object header) == 12 bytes when run on 64-bit JVM with -XX:+UseCompressedOops.

Gregory.K
  • 1,327
  • 6
  • 12
0

I downloaded the original source code of Java 1.5 from the Oracle web and it contains the following lines:

/**
 * This implements the expansion semantics of ensureCapacity with no
 * size check or synchronization.
 */
void expandCapacity(int minimumCapacity) {
    int newCapacity = (value.length + 1) * 2;
    if (newCapacity < 0) {
        newCapacity = Integer.MAX_VALUE;
    } else if (minimumCapacity > newCapacity) {
        newCapacity = minimumCapacity;
    }   
    char newValue[] = new char[newCapacity];
    System.arraycopy(value, 0, newValue, 0, count);
    value = newValue;
} 

So at least two things are clear:

  • the theory that other corrections were additionally added is false (e.g. "the odd (double + 2) semantics made more sense when it was the only line in the function" is not true)
  • most probably it was originally meant as "let's make room for at least one more character and let's multiply it by two"
Honza Zidek
  • 9,204
  • 4
  • 72
  • 118
0
 public void ensureCapacity(int minimumCapacity) {
     if (minimumCapacity > value.length) {
         expandCapacity(minimumCapacity);
     }
 }



 void expandCapacity(int minimumCapacity) {
     int newCapacity = (value.length + 1) * 2;
     if (newCapacity < 0) {
         newCapacity = Integer.MAX_VALUE;
     } else if (minimumCapacity > newCapacity) {
         newCapacity = minimumCapacity;
     }
    value = Arrays.copyOf(value, newCapacity);
}

NOTE: value.length is the capacity of the StringBuffer, not the length.

It has nothing to do with a null string because minimum capacity is 16.

What I think is, the memory allocations cost so much time, and if we are calling ensureCapacity() frequently with increasing minimumCapacity , (capacity +1)*2 will allocate a bit more memory and may reduce further allocations and save some time.

lets consider initial capacity as 16,

only with doubling 16 , 32 , 64 , 128 , 256 , 512 , 1024 , 2048 , so on...

with double +2 16 , 34 , 70 , 142 , 286 , 574 , 1150 , 2302 , so on...

Thus the memory will gradually keeping increasing every time and may decrease the no of allocations of memory.

Abhishek
  • 87
  • 5