26

I have a condition that a StringBuilder keeps storing lines matching a pattern from a large flat file (100's of MB). However after reaching a condition I write the content of the StringBuilder varialble to a textfile.

Now I wonder if I should use the same variable by resetting the object ->

stringBuilder.delete(0,stringBuilder.length())

OR

stringBuilder=new StringBuilder();

Please suggest which would do you think is better as far as both performance and OOM issues are concerned.

Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724
Chandru
  • 336
  • 1
  • 5
  • 14

8 Answers8

38

I think StringBuilder#delete(start, end) is still expensive call, you should do:

stringBuilder.setLength(0);

to reset it.


UPDATE: After looking at source code of StringBuilder It seems setLength(int) leaves old buffer intact and it is better to call: StringBuilder#trimToSize() after above call which attempts to reduce storage used for the character sequence.

So something like this would be more efficient:

stringBuilder.setLength(0); // set length of buffer to 0
stringBuilder.trimToSize(); // trim the underlying buffer
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 5
    You might leak memory using this technique. If you create bigger and bigger Strings, then smaller ones, the underlying `char[]` will get bigger even though you aren't using it all. – Sotirios Delimanolis Sep 12 '13 at 14:27
  • @SotiriosDelimanolis might be right, take a look at the source for `setLength`: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/AbstractStringBuilder.java#AbstractStringBuilder.setLength%28int%29 Basically, `setLength` just nulls out old values and doesn't actually shrink the underlying array. – DigitalZebra Sep 12 '13 at 14:33
  • 3
    @anubhava I'm not talking about performance. The `setLength()` method just sets the `count` field to the new length. So if your `char[]` is of size 10_000_000, you're using all of that space for nothing. – Sotirios Delimanolis Sep 12 '13 at 14:37
  • @SotiriosDelimanolis: That's the main point, how can you say that we're still using that `10000000 char array` after `setLength(0)`? btw looking at source code it is actually setting all the previous `char[]` buffer to `\0` individually. Also see this answer for more details and please go through some answers/discussions there: http://stackoverflow.com/a/5193094/548225 – anubhava Sep 12 '13 at 14:42
  • 1
    @anubhava Say you had a `count` of 100 for a previous use, possibly a `char[100]` to accompany that. Calling `setLength(0)`, where `100 > 0`, ie. there is no `\0`, that only happens when the capacity **increases**. Instead the `else` is executed and `count` is set to `0`. So your `char[]` reference doesn't get reassigned and you are left with a `char[100]` which you might not fully use. Even if it did set all the char indices to `\0` you would still might be wasting them. – Sotirios Delimanolis Sep 12 '13 at 14:47
  • @SotiriosDelimanolis: Yes I misread `if/else` condition and its clear that else part will be execute. I think `setLength(0)` followed by `trimToSize()` would be better for clearing old buffer. – anubhava Sep 12 '13 at 15:05
  • 2
    @anubhava `trimToSize()` is dangerous too because it will create a `char[0]` which will have to be expanded at the first `append()`. – Sotirios Delimanolis Sep 12 '13 at 15:12
  • Yes it needs to be expanded but not sure why it is dangerous. Rather than keeping a huge buffer intact IMO it is better to expand it on demand. – anubhava Sep 12 '13 at 15:20
  • 2
    @SotiriosDelimanolis I know this is ancient, but is a larger unused, underlying array where the length has been set to zero the same thing as a memory leak? I mean, it will use up memory, but won't it be garbage collected eventually when the stringBuilder goes out of scope? Alternatively reallocating the variable to new stringBuilder could create a lot of deallocated garbage to be collected. Which is worse? I guess it depends on context of the application. – Davos Aug 09 '17 at 07:03
6

Imho, I would suggest to use new:

stringBuilder = new StringBuilder();

I've never heard about a memory leak in StringBuilder, but when you really push the limits you never know. I would hedge my bets against it by using a new instance every time.

In the worst case maybe you lose some efficiency and the gc gets a workout, but you rule out the OOM possibility.

Because of that, and also for reasons of clarity I would personally take the new approach.

vikingsteve
  • 38,481
  • 23
  • 112
  • 156
5

One fundamental difference is sb.delete keeps the reference, while the constructor loses it.

If your SB is a method argument, and is supposed to be used to pass content back to the caller, you have to use sb.delete. The caller holds the original reference.

user3241961
  • 51
  • 1
  • 2
3

Well there's a greater difference between the two. The first retains whatever capacity it had before you deleted the characters (i.e. stringBuilder.capacity()) whereas the second creates a new StringBuilder with the default capacity, 16. Of course you could just pass stringBuilder.capacity() as an argument to the constructor, but it's important to understand the distinction here, nonetheless.

In any case, I highly doubt that you will see a substantial performance difference between these two variants, so choose whichever is more readable and easier to manage. Only when you've conclusively determined that this is causing some sort of bottleneck should you change your approach.

arshajii
  • 127,459
  • 24
  • 238
  • 287
2

It is cheaper reuse the created object than allocate a new one, always. It also avoids extra work for the Garbage Collector, as you are handling only an object.

The faster way is:

stringBuilder.setLength(0);
jmlotero
  • 51
  • 4
1

I would use:

 stringBuilder = new StringBuilder();

because if you fill it with a large amount of data, calling stringBuilder.setLength(0); won't unallocate the backing array, so you could see memory usage stay high unnecessarily.

Also, it's just easier to read and understand.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
1

Ideally we should use new StringBuilder() Digging a bit in StringBuilder class from grepcode I get to know the following.

Creating new object :

/**
     * Creates an AbstractStringBuilder of the specified capacity.
     */
    AbstractStringBuilder(int capacity) {
        value = new char[capacity];
    }

new StringBuilder() creates a new object with initial capacity char array. Overhead here : GC will be called for clearing older object.

Using delete :

public AbstractStringBuilder delete(int start, int end) {
        if (start < 0)
            throw new StringIndexOutOfBoundsException(start);
        if (end > count)
            end = count;
        if (start > end)
            throw new StringIndexOutOfBoundsException();
        int len = end - start;
        if (len > 0) {
            System.arraycopy(value, start+len, value, start, count-end);
            count -= len;
        }
        return this;
    }

Using Length and TrimToSize :

public void trimToSize() {
        if (count < value.length) {
            value = Arrays.copyOf(value, count);
        }
    }

Will call copyOf From array class

public static char[] copyOf(char[] original, int newLength) { char[] copy = new char[newLength]; System.arraycopy(original, 0, copy, 0, Math.min(original.length, newLength)); return copy; }

Now it will also call System.arrayCopy which is a native method. Now if you see in copyOf we are creating a new charArray again of 0 length, and when we try to add again data to it, it will call expand because the current length would be 0. So I think its better to call new StringBuilder()

You can see the above code on grepcode

PS : @user3241961 is write in case you are using reference of this object then new would require to set it again

Akhil Dad
  • 1,804
  • 22
  • 35
0

If you are in a tight loop and you will continue back in that loop after you write the data to the file, you should definitely re-use the StringBuilder. There's no reason not to and it's better than churning the GC. If you were writing this in C or C++ you would re-use the buffer.

Also, while true that the delete(...) method calls System.arraycopy, the number of bytes copied is 0 so it's insignificant.

Ah - somebody else mentioned me that there is a setLength(...) method which is the fastest way to re-use the buffer.

jpayne
  • 111
  • 1
  • 8