3

If we take a look at the String#substring method implementation :

new String(offset + beginIndex, endIndex - beginIndex, value);

We see that a new String is created with the same original content (parameter char [] value).

So the workaround is to use new String(toto.substring(...)) to drop the reference to the original char[] value and make it eligible for GC (if no more references exist).

I would like to know if there is a special reason that explain this implementation. Why the method doesn't create herself the new shorter String and why she keeps the full original value instead?

The other related question is : should we always use new String(...) when dealing with substring?

alain.janinm
  • 19,951
  • 10
  • 65
  • 112

3 Answers3

2

Because String is immutable class

Also See

Community
  • 1
  • 1
jmj
  • 237,923
  • 42
  • 401
  • 438
  • 1
    A useful complementary link on **why** `String` is immutable: [Why String is immutable in Java?](http://javarevisited.blogspot.it/2010/10/why-string-is-immutable-in-java.html) – Luca Geretti Jun 20 '12 at 09:16
  • @LucaGeretti you should propose that as an edit to the answer. – Kazekage Gaara Jun 20 '12 at 09:18
  • @Kazekage: I didn't simply because it is an external link. I'd rather integrate (some of) the content into the answer. – Luca Geretti Jun 20 '12 at 09:19
  • I know - maybe not enough - that Strings are immutable but I don't understand why it an explanation... I mean they can have call the String constructor with a shorter char[] instead of the original one. – alain.janinm Jun 20 '12 at 09:23
2

I would like to know if there is a special reason that explain this implementation. Why the method doesn't create herself the new shorter String and why she keeps the full original value instead?

Because in most use-cases it is faster for substring() to work this way. At least, that's what Sun / Oracle's empirical measurements would have shown. By doing this, the implementation avoids allocating a backing array and copying characters to the array.

This is only a non-optimization if you have to then copy the String to avoid a memory leakage problem. In the vast majority of cases, the substrings become garbage in a relatively short period of time, and there is no long-term leakage of memory.


Hypothetically, the Java designers could have provided two versions of substring, one which behaved as currently, and the other that created a String with its own backing array. But that would encourage the developer to waste brain-cycles thinking about which version to use. And then there's the problem of utility methods that build on substrings ... like the Pattern / Matcher classes for instance. So I think it is a good thing that they didn't.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Ok, thanks for the answer! So you mean always using `new String()` improve memory management, but it will not make the app faster? – alain.janinm Jun 20 '12 at 09:26
  • 1
    @alain.janinm - it is more complicated than that. On the one hand, `new String()` often doesn't improve memory management at all. On the other hand, when it does improve memory management it will make the app go faster. Basically, you have to understand your application's behaviour in order to know whether this will improve things. – Stephen C Jun 20 '12 at 09:34
  • Ok I was wondering the same but I was not sure! Thanks for the enlightenment! – alain.janinm Jun 20 '12 at 09:49
1

The reason for this implementation is efficiency. By pointing to the same char[] as the original string, no data needs to be copied.

This does have a downside though, as you've already hinted at yourself. If the original string is long and you just want to get a small part of it, and you don't need the original string anymore after that, then the complete original array is still referenced and can't be garbage collected. You already know how to avoid that - do new String(original.substring(...)).

should we always use new String(...) when dealing with substring?

No, not always. Only when you know it might cause problem. In many cases, referring to the original char[] instead of copying the data is more efficient.

Jesper
  • 202,709
  • 46
  • 318
  • 350