15

I had a Java program that was using a StringBuilder to build a string from an input stream and eventually it caused an out of memory error when the string got too long. I tried breaking it up into shorter strings and storing them in an ArrayList and this avoided the OOM even though I was trying to store the same amount of data. Why is this?

My suspicion is that with one very long string, the computer has to find one contiguous place in memory for it, but with an ArrayList it could use multiple smaller places in memory. I know memory can be tricky in Java so this question may not have a straight-forward answer but hopefully someone can put me on the right track. Thanks!

user1803551
  • 12,965
  • 5
  • 47
  • 74
Rexana
  • 163
  • 1
  • 8
  • 3
    I think you've nailed the answer already. – Andrew Williamson Jul 31 '17 at 00:30
  • 2
    @AndrewWilliamson, Intuition is great, but measurement is better. This is an interesting question without an answer until someone can point at source code or show more detailed measurements. – merlin2011 Jul 31 '17 at 00:39
  • 3
    @Rexana, Please provide the source code for your experiment and the Java version as well as the platform. This will enable others to replicate your results. – merlin2011 Jul 31 '17 at 00:39
  • Depending on how the string builder was designed, it may use simple concatenation. In that case, each time you append a new string of 10 chars to an existing string of 100 chars, the system allocates a new string of 110 chars and copies first the existing and then the new strings before getting rid of the old one. At some point, you are using 210 chars in two chunks. If the existing string is more than half the available memory, an Out Of Memory error will be thrown. FYI, speed is also an issue with concatenation-based string building. – jotaelesalinas Jul 31 '17 at 00:46
  • It is worth noting that, with virtual and physical memory separated, you could easily find 1TB of contiguous free address space and map something there, even with way less physical memory. Question is whether Java's own memory management prevents use of that functionality. – Siguza Jul 31 '17 at 00:48
  • While Strings are typically implemented by a single backing array, this may actually be an implementation detail. [JLS](https://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.3.3) and [JavaDocs](https://docs.oracle.com/javase/8/docs/api/?java/lang/String.html) don't seem to provide any guarantees about the internal representation. – Hulk Jul 31 '17 at 07:42

2 Answers2

9

Essentially, you are correct.

A StringBuilder (more precisely, AbstractStringBuilder) uses a char[] to store the string representation (though generally a String is not a char[]). While Java does not guarantee that an array is indeed stored in contiguous memory, it most probably is. Thus, whenever appending strings to the underlying array, a new array is allocated and if it is too large, an OutOfMemoryError is thrown.

Indeed, executing the code

StringBuilder b = new StringBuilder();
for (int i = 0; i < 7 * Math.pow(10, 8); i++)
    b.append("a"); // line 11

throws the exception:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
    at java.lang.StringBuilder.append(StringBuilder.java:136)
    at test1.Main.main(Main.java:11)

When line 3332 char[] copy = new char[newLength]; is reached inside Arrays.copyOf, the exception is thrown because there is not enough memory for an array of size newLength.

Note also the message given with the error: "Java heap space". This means that an object (an array, in this case) could not be allocated in the Java heap. (Edit: there is another possible cause for this error, see Marco13's answer).

2.5.3. Heap

The Java Virtual Machine has a heap that is shared among all Java Virtual Machine threads. The heap is the run-time data area from which memory for all class instances and arrays is allocated.

... The memory for the heap does not need to be contiguous.

A Java Virtual Machine implementation may provide the programmer or the user control over the initial size of the heap, as well as, if the heap can be dynamically expanded or contracted, control over the maximum and minimum heap size.

The following exceptional condition is associated with the heap:

  • If a computation requires more heap than can be made available by the automatic storage management system, the Java Virtual Machine throws an OutOfMemoryError.

Breaking the array into smaller arrays of the same total size avoids the OOME because each array can be stored separately in a smaller contiguous area. Of course, you "pay" for this by having to point from each array to the next one.

Compare the above code with this one:

static StringBuilder b1 = new StringBuilder();
static StringBuilder b2 = new StringBuilder();
...
static StringBuilder b10 = new StringBuilder();

public static void main(String[] args) {
    for (int i = 0; i < Math.pow(10, 8); i++)
        b1.append("a");
    System.out.println(b1.length());
    // ...
    for (int i = 0; i < Math.pow(10, 8); i++)
        b10.append("a");
    System.out.println(b10.length());
}

The output is

100000000
100000000
100000000
100000000
100000000
100000000
100000000
100000000

and then an OOME is thrown.

While the first program could not allocate more than 7 * Math.pow(10, 8) array cells, this one sums up to at least 8 * Math.pow(10, 8).

Note that the size of the heap can be changed with VM initialization parameters, so the size which will throw the OOME is not constant between systems.

user1803551
  • 12,965
  • 5
  • 47
  • 74
  • Indeed, it would have been helpful to know whether the exception actually *said* "Java heap space". If so, one could just start the program with `java -Xmx3000m TheProgram` and be fine. If the error came from the actual string implementation, then the problem would be that the array is plainly too long... – Marco13 Jul 31 '17 at 11:39
  • @Marco13 True about the error message. The last sentence in my answer agrees that you can increase the heap size (but then I'll increase the numbers in my example :) ). I don't think that string implementation is relevant here since it's represented as an array here, or do you mean something else? – user1803551 Jul 31 '17 at 11:44
  • 1
    This point referred to the fact that `StringBuilder` doubles the size of its `values` array, and even if there is enough memory, then for an array size greater than `Integer.MAX_VALUE` (detected by checking for overflow), the `StringBuilder` will throw an `OutOfMemoryError` - **without** the "Java heap space" part, that is. – Marco13 Jul 31 '17 at 11:46
3

It could have been helpful if you had posted a stack trace, if available. But there is one very likely cause of the OutOfMemoryError that you observed.

(Although until now, this answer may only be an "educated guess". Nobody can pinpoint the reason without examining the conditions under which the error occured on your system)

When concatenating strings using a StringBuilder, then the StringBuilder will internally maintain a char[] array containing the characters of the string to be constructed.

When appending a sequence of strings, then the size of this char[] array may have to be increased after a while. This is eventually done in the AbstractStringBuilder base class:

/**
 * This method has the same contract as ensureCapacity, but is
 * never synchronized.
 */
private void ensureCapacityInternal(int minimumCapacity) {
    // overflow-conscious code
    if (minimumCapacity - value.length > 0)
        expandCapacity(minimumCapacity);
}

/**
 * This implements the expansion semantics of ensureCapacity with no
 * size check or synchronization.
 */
void expandCapacity(int minimumCapacity) {
    int newCapacity = value.length * 2 + 2;
    if (newCapacity - minimumCapacity < 0)
        newCapacity = minimumCapacity;
    if (newCapacity < 0) {
        if (minimumCapacity < 0) // overflow
            throw new OutOfMemoryError();
        newCapacity = Integer.MAX_VALUE;
    }
    value = Arrays.copyOf(value, newCapacity);
}

It is called whenever the string builder notices that the new data does not fit into the currently allocated array.

This is obviously one place where an OutOfMemoryError may be thrown. (Strictly speaking, it does not necessarily have to be really "out of memory" there. It is just checking for an overflow in view of the maximum size that an array can have...).

(Edit: Also have a look at the answer by user1803551 : This does not necessarily have to be the place where your error came from! Yours might indeed come from the Arrays class, or rather from inside the JVM)

When examining the code closely, you will notice that the size of the array is doubled each time when its capacity is expanded. This is crucial: If it would only ensure that the new data block can be appended, then appending n characters (or other strings with fixed length) to the StringBuilder would have a running time of O(n²). When the size is increased with a constant factor (here, 2), then the running time is only O(n).

However, this doubling of the size may lead to an OutOfMemoryError even though the actual size of the resulting string is still far smaller than the limit.

Marco13
  • 53,703
  • 9
  • 80
  • 159
  • I also looked through `ensureCapacityInternal` for some time, but I could not imagine that by a sequence of appends `minimumCapacity` would get numerically overflown before the heap size would get too large. probably a concrete computation is in order. – user1803551 Jul 31 '17 at 11:47
  • 1
    @user1803551 It's hard to sensibly do the math here, considering the repeated allocations and the fact that the strings that are *appended* are likely contained in the string pool (causing further memory pressure). But naively, a `char[]` array with 1 billion elements will take roughly 2 GB. The heap size may be made larger, e.g. with `-Xmx16g`. In contrast to that, the technical limit of arrays *never* being able to contain more than 2 billion elements is built into the Java language and VM. So more general: Even when the heap is large enough, the OOME may still be thrown here. – Marco13 Jul 31 '17 at 11:54
  • I've done a similar calculation. Overflow occurs when the array length is 2147483647. Since each `char` is 2 bytes, filling that array will take ~4.3GB. If you can afford that size for the heap then indeed an overflow can occur. I was too optimistic with my evaluation that heap size was almost surely the limiting factor. The repeated allocations are GC'd before the OOME is thrown so I wasn't bothered by that. I also used a single string in my example to keep the string pool small so that it doesn't take heap space, yet that was the limiting factor anyway. – user1803551 Jul 31 '17 at 12:05
  • Out of curiosity, which version of Java is that source code from? Java 8 and Java 9 are pretty different - they use `newCapacity(int)`, `hugeCapacity(int)` and `MAX_ARRAY_SIZE = Integer.MAX_VALUE-8` – user85421 Jul 31 '17 at 12:18
  • @CarlosHeuberger That was from Java8, and I think that's a point that occasionally changed between 7/8/9 (although I didn't yet dive deeper into 9) – Marco13 Jul 31 '17 at 15:12
  • well, I could not find it in Java 8_141... but found an old 7_80 that has that code – user85421 Jul 31 '17 at 15:23
  • @CarlosHeuberger Sorry, I should have looked it up exactly (obviously, there haven't only been changes in 7/8/9, but also in **8** itself). The version was 1.8.0_31. – Marco13 Jul 31 '17 at 15:39