16

I was going through the String class API and looks like there is a potential memory leak caused by substring method as it shares same character array as original String.

If original string is huge then small string returned by substring can prevent original string(backed up by large array) from garbage collection in Java.

Any thoughts or did I read the API wrong.

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
Srujan Kumar Gulla
  • 5,721
  • 9
  • 48
  • 78
  • 2
    This is not technically a memory leak at all, since the character array is still referenced and could be collected later when all strings that reference it are collected. Part of the character array might not be used anymore, but that does not make it a leak. – cdhowie Jan 04 '13 at 16:36
  • 1
    if you have a 100 large string each 100MB and you have a substring(0,1) you are technically holding that value[] used in String class and never in your application huge strings are eligible for garbage collection – Srujan Kumar Gulla Jan 04 '13 at 16:39
  • 1
    best link http://javarevisited.blogspot.com/2012/03/why-character-array-is-better-than.html – Premraj Jul 23 '15 at 10:34

3 Answers3

20

There is a potential for a memory leak, if you take a substring of a sizable string and not make a copy (usually via the String(String) constructor).

Note that this has changed since Java 7u6. See https://bugs.openjdk.java.net/browse/JDK-7197183.

The original assumptions around the String object implementing a flyweight pattern are no longer regarded as valid.

See this answer for more info.

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
  • both strategies are valid. the original impl was fine; the new impl is fine. but it is not fine to change the impl.... it is an astonishing break of compatibility. they couldn't have done it for the apparent reason. there's something else going on. – irreputable Jan 04 '13 at 16:47
  • @irreputable This was admittedly an implementation detail that was not part of the specification => it should not have been relied on. The change has been made for performance reasons. – assylias Jan 04 '13 at 16:49
  • 2
    @assylias that's more of a cop-out. people have established belief that substring()/trim() is O(1). silently changing it to O(n) is not very nice. And how do you write code the behave consistently on different Java versions? – irreputable Jan 04 '13 at 16:53
  • they did it for performance reason, but not the apparent ones, which are very weak. – irreputable Jan 04 '13 at 16:55
  • @irreputable they did it to improve the performance of Strings in general, with the drawback that the performance of substring has been deteriorated... – assylias Jan 04 '13 at 16:57
  • they did it for something more conspiratory. the "bug" has been filed for more than a decade (4513622 etc); if the bug is valid for the apparent reasons, they should have fixed it long time ago. – irreputable Jan 04 '13 at 17:03
4
  1. It was the case until Java 7u6 - you would generally deal with the issue by doing:

    String sub = new String(s.substring(...)); // create a new string
    

    That effectively removes the dependency and the original string is now available for GC. This is by the way one of the only scenarios where using the string constructor makes sense.

  2. Since Java 7u6, a new String is created and there is no memory issue any longer.

assylias
  • 321,522
  • 82
  • 660
  • 783
  • yeah but it creates a new problem. if you `trim()` off one white space, which is a very common case, you end up copying `N-1` chars. – irreputable Jan 04 '13 at 16:48
  • 1
    @irreputable You can always find corner cases where it will perform worse. The goal of a general purpose ilbrary is to perform well *on average* and it seems that many different use cases have been taken into account before making that change. – assylias Jan 04 '13 at 16:52
  • Also copying chars is *very* fast (not saying it has not cost). – assylias Jan 04 '13 at 16:54
  • 1
    `trim()` is not a corner case. actually `str.substring().trim()` is a very common case, which involves copying twice. `O(1)->O(n)` is a big deal. and no, oracle does not know how this will impact the existing applications in the wild. – irreputable Jan 04 '13 at 16:58
  • @irreputable I understand your point. There are many variables to take into account (in particular in this case better use of CPU's caches) which could balance this even if you use substring often. You can dig in [the discussions](http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-June/010381.html) to get more info. – assylias Jan 04 '13 at 17:04
1

In Java 7, String's subString is modified to :

/**
     * Returns a new string that is a substring of this string. The
     * substring begins with the character at the specified index and
     * extends to the end of this string. <p>
     * Examples:
     * <blockquote><pre>
     * "unhappy".substring(2) returns "happy"
     * "Harbison".substring(3) returns "bison"
     * "emptiness".substring(9) returns "" (an empty string)
     * </pre></blockquote>
     *
     * @param      beginIndex   the beginning index, inclusive.
     * @return     the specified substring.
     * @exception  IndexOutOfBoundsException  if
     *             <code>beginIndex</code> is negative or larger than the
     *             length of this <code>String</code> object.
     */
    public String substring(int beginIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        int subLen = value.length - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
    }

Hence, everytime you do subString with beginIndex NOT equal to 0, we have new String Object.

Ankush soni
  • 1,439
  • 1
  • 15
  • 30