193

I used a variable with a lot of data in it, say String data. I wanted to use a small part of this string in the following way:

this.smallpart = data.substring(12,18);

After some hours of debugging (with a memory visualizer) I found out that the objects field smallpart remembered all the data from data, although it only contained the substring.

When I changed the code into:

this.smallpart = data.substring(12,18)+""; 

..the problem was solved! Now my application uses very little memory now!

How is that possible? Can anyone explain this? I think this.smallpart kept referencing towards data, but why?

UPDATE: How can I clear the big String then? Will data = new String(data.substring(0,100)) do the thing?

hsmit
  • 3,906
  • 7
  • 34
  • 46
  • Reading more about your ultimate intent below: Where does the large string come from in the first place? If read from a file or database CLOB or something then only reading what you need while parsing will be optimal all the way around. – PSpeed Jan 27 '10 at 23:07
  • 4
    Amazing...I am working in java more than 4 to 5 years, still this is new for me :). thanks for the info bro. – Parth May 21 '10 at 06:23
  • 1
    There is a subtlety to using `new String(String)`; see http://stackoverflow.com/a/390854/8946. – Lawrence Dol Jan 11 '13 at 00:52

9 Answers9

160

Doing the following:

data.substring(x, y) + ""

creates a new (smaller) String object, and throws away the reference to the String created by substring(), thus enabling garbage collection of this.

The important thing to realise is that substring() gives a window onto an existing String - or rather, the character array underlying the original String. Hence it will consume the same memory as the original String. This can be advantageous in some circumstances, but problematic if you want to get a substring and dispose of the original String (as you've found out).

Take a look at the substring() method in the JDK String source for more info.

EDIT: To answer your supplementary question, constructing a new String from the substring will reduce your memory consumption, provided you bin any references to the original String.

NOTE (Jan 2013). The above behaviour has changed in Java 7u6. The flyweight pattern is no longer used and substring() will work as you would expect.

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
  • 90
    That's one of the very few cases where the `String(String)` constructor (i.e. the String constructor taking a String as input) is useful: `new String(data.substring(x, y))` does effectively the same thing as appending `""`, but it makes the intent somewhat clearer. – Joachim Sauer Jan 27 '10 at 14:57
  • 3
    just to precise, substring uses the `value` attribute of the original string. I think that's why the reference is kept. – Valentin Rocher Jan 27 '10 at 15:02
  • @Bishiboosh - yes, that's right. I didn't want to expose the particularities of the implementation, but that's precisely what's happening. – Brian Agnew Jan 27 '10 at 15:04
  • 5
    Technically it's an implementation detail. But it's frustrating nonetheless, and catches out a lot of people. – Brian Agnew Jan 27 '10 at 15:30
  • If I want to parse the big string step by step, how can I gradually decrease the memory use? please see my UPDATE! – hsmit Jan 27 '10 at 15:49
  • @hsmit, you would have to copy the remainder substring to a new string also. Note that by doing that you are copying large amounts of data repeatedly. Your memory performance will improve at the cost of time performance. – PSpeed Jan 27 '10 at 23:06
  • This is one of the cases where C++ is easier. – Zippo Jan 13 '11 at 17:12
  • 1
    I wonder if it's possible to optimize this in the JDK using weak references or such. If I'm the last person that needs this char [], and I only need a bit of it, make a new array for me to use internally. – WW. Jun 22 '11 at 02:24
  • How can I get a scenario to check it? I've tried some test but I got just opposite results – NIVESH SENGAR Jun 26 '12 at 11:25
  • @NiveshSengar possibly because this is not true any longer since java 7 update 6. – assylias Dec 31 '12 at 23:44
  • There is a subtlety to using `new String(String)`; see http://stackoverflow.com/a/390854/8946. – Lawrence Dol Jan 11 '13 at 00:50
28

If you look at the source of substring(int, int), you'll see that it returns:

new String(offset + beginIndex, endIndex - beginIndex, value);

where value is the original char[]. So you get a new String but with the same underlying char[].

When you do, data.substring() + "", you get a new String with a new underlying char[].

Actually, your use case is the only situation where you should use the String(String) constructor:

String tiny = new String(huge.substring(12,18));
Pascal Thivent
  • 562,542
  • 136
  • 1,062
  • 1,124
17

When you use substring, it doesn't actually create a new string. It still refers to your original string, with an offset and size constraint.

So, to allow your original string to be collected, you need to create a new string (using new String, or what you've got).

C. K. Young
  • 219,335
  • 46
  • 382
  • 435
5

In Java strings are imutable objects and once a string is created, it remains on memory until it's cleaned by the garbage colector (and this cleaning is not something you can take for granted).

When you call the substring method, Java does not create a trully new string, but just stores a range of characters inside the original string.

So, when you created a new string with this code:

this.smallpart = data.substring(12, 18) + ""; 

you actually created a new string when you concatenated the result with the empty string. That's why.

Reno
  • 33,594
  • 11
  • 89
  • 102
Kico Lobo
  • 4,374
  • 4
  • 35
  • 48
5

I think this.smallpart kept referencing towards data, but why?

Because Java strings consist of a char array, a start offset and a length (and a cached hashCode). Some String operations like substring() create a new String object that shares the original's char array and simply has different offset and/or length fields. This works because the char array of a String is never modified once it has been created.

This can save memory when many substrings refer to the same basic string without replicating overlapping parts. As you have noticed, in some situations, it can keep data that's not needed anymore from being garbage collected.

The "correct" way to fix this is the new String(String) constructor, i.e.

this.smallpart = new String(data.substring(12,18));

BTW, the overall best solution would be to avoid having very large Strings in the first place, and processing any input in smaller chunks, aa few KB at a time.

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
3

As documented by jwz in 1997:

If you have a huge string, pull out a substring() of it, hold on to the substring and allow the longer string to become garbage (in other words, the substring has a longer lifetime) the underlying bytes of the huge string never go away.

Ken
  • 2,886
  • 20
  • 11
2

Firstly, calling java.lang.String.substring creates new window on the original String with usage of the offset and length instead of copying the significant part of underlying array.

If we take a closer look at the substring method we will notice a string constructor call String(int, int, char[]) and passing it whole char[] that represents the string. That means the substring will occupy as much amount of memory as the original string.

Ok, but why + "" results in demand for less memory than without it??

Doing a + on strings is implemented via StringBuilder.append method call. Look at the implementation of this method in AbstractStringBuilder class will tell us that it finally do arraycopy with the part we just really need (the substring).

Any other workaround??

this.smallpart = new String(data.substring(12,18));
this.smallpart = data.substring(12,18).intern();
laika
  • 1,319
  • 2
  • 10
  • 16
2

Just to sum up, if you create lots of substrings from a small number of big strings, then use

   String subtring = string.substring(5,23)

Since you only use the space to store the big strings, but if you are extracting a just handful of small strings, from losts of big strings, then

   String substring = new String(string.substring(5,23));

Will keep your memory use down, since the big strings can be reclaimed when no longer needed.

That you call new String is a helpful reminder that you really are getting a new string, rather than a reference to the original one.

mdma
  • 56,943
  • 12
  • 94
  • 128
0

Appending "" to a string will sometimes save memory.

Let's say I have a huge string containing a whole book, one million characters.

Then I create 20 strings containing the chapters of the book as substrings.

Then I create 1000 strings containing all paragraphs.

Then I create 10,000 strings containing all sentences.

Then I create 100,000 strings containing all the words.

I still only use 1,000,000 characters. If you add "" to each chapter, paragraph, sentence and word, you use 5,000,000 characters.

Of course it's entirely different if you only extract one single word from the whole book, and the whole book could be garbage collected but isn't because that one word holds a reference to it.

And it's again different if you have a one million character string and remove tabs and spaces at both ends, making say 10 calls to create a substring. The way Java works or worked avoids copying a million characters each time. There is compromise, and it's good if you know what the compromises are.

gnasher729
  • 51,477
  • 5
  • 75
  • 98