11

I always thought if I do String s = "Hello World".substring(0, 5), then I just get a new string s = "Hello". This is also documented in the Java API doc: "Returns a new string that is a substring of this string".

But when I saw the following two links, I began to doubt.

What is the purpose of the expression "new String(...)" in Java?

String constructor considered useless turns out to be useful after all

Basically, they say if I use String s = "Hello World".subString(0, 5), I still get a String which holds "Hello World"'s char array.

Why? Does Java really implement substring in this way? Why in this way? Why not just return a brand new shorter substring?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jackson Tale
  • 25,428
  • 34
  • 149
  • 271

6 Answers6

5

Turning it around, why allocate a new char[] when it is not necessary? This is a valid implementation since String is immutable. It saves allocations and memory in the aggregate.

Sean Owen
  • 66,182
  • 23
  • 141
  • 173
  • 3
    Put another way, returning a brand new shorter substring requires linear extra memory and time. Returning a view onto the same `char[]` requires only constant memory and constant time. (Additionally, you can get a true copy from a substring view, but you can't go the other way -- so this option is more flexible in general.) – Louis Wasserman May 31 '12 at 08:56
4

It's supposed to be an efficiency measure. i.e. when you're taking a substring you won't create a new char array, but merely create a window onto the existing char array.

Is this worthwhile ? Maybe. The downside is that it causes some confusion (e.g. see this SO question), plus each String object needs to carry the offset info into the array, even if it's not used.

EDIT: This behaviour has now changed as of Java 7. See the linked answer for more info

Community
  • 1
  • 1
Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
  • But the new sub string anyway should be considered as a new string, right? "Hello" is part of "Hello World", but it itself is a brand new string. If in your philosophy, Java should do all things such as if I create `"hello"`, then I create `"hello world"`, then the first `"hello"` should be removed from the memory and utilise the memory of `"hello world"` instead. – Jackson Tale May 31 '12 at 08:37
  • 1
    Not really. This particular implementation doesn't imply that every single string containing the same sequence of characters in the same JVM should point at the same char array. What it merely does is optimises strings derived with substring, not other way around. – maksimov May 31 '12 at 08:50
  • 4
    In Java 7 it is no longer true: it copies the slice of the char array for the specific string. – Mark Rotteveel May 13 '13 at 18:33
  • Could you update this answer regarding to the newer versions of Java? – Tim Oct 08 '15 at 20:30
1

Does Java really implement subString in this way

Looking at the code (JDK 7) (which I have simplified), yes:

public String substring(int beginIndex, int endIndex) {
    .......
    return new String(offset + beginIndex, endIndex - beginIndex, value);
}

// Package private constructor which shares value array for speed.
String(int offset, int count, char value[]) {
    this.value = value;
    this.offset = offset;
    this.count = count;
}

Why in this way? Why not just return a brand new shorter substring?

the comment seems to imply that speed was the reason

assylias
  • 321,522
  • 82
  • 660
  • 783
  • You need to look at the String constructor, since Java 7 it no longer takes the same char-array: the constructor copies only the slice of the array between beginIndex and endIndex – Mark Rotteveel May 13 '13 at 18:26
  • @MarkRotteveel Actually since Java 7u6 IIRC. These implementation-specific answers tend to become stale quite quickly! – assylias May 13 '13 at 18:43
  • 1
    Yeah, I noticed after answering and commenting that this question was a year old... – Mark Rotteveel May 13 '13 at 18:45
  • @MarkRotteveel I see: this answer pushed it to the top of the queue: http://stackoverflow.com/a/16528206/829571 ;-) – assylias May 13 '13 at 18:46
  • @MarkRotteveel To be exact, there is the case (in jdk7u6), when the constructor does not copy, but takes the same char-array: if ((beginIndex == 0) && (endIndex == value.length)). – valeryan Feb 20 '20 at 09:18
1

Although it used to be true that a String created with subString() had the same backing char[] (presumably to save space and time of copying), that is no longer true since Java 7 Update 6, as this sharing of char[] had its memory overhead. This overhead especially existed if (large) Strings are loaded, a small substring is taken and the large string is discarded. If the small string is kept for a long time this can lead to significant unneeded memory use.

In any case, in the current version (Java 7 Update 21), subString() calls the constructor String(char value[], int offset, int count) with the char[] of the original string, the constructor then makes a copy of the specified range from the char array:

public String(char value[], int offset, int count) {
    if (offset < 0) {
        throw new StringIndexOutOfBoundsException(offset);
    }
    if (count < 0) {
        throw new StringIndexOutOfBoundsException(count);
    }
    // Note: offset or count might be near -1>>>1.
    if (offset > value.length - count) {
        throw new StringIndexOutOfBoundsException(offset + count);
    }
    this.value = Arrays.copyOfRange(value, offset, offset+count);
}
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
0

Because String is anyway immutable. So creating a new object altogether does not make much sense

Saurabh
  • 7,894
  • 2
  • 23
  • 31
  • But the new sub string anyway should be considered as a new string, right? "Hello" is part of "Hello World", but it itself is a brand new string. If in your philosophy, Java should do all things such as if I create `"hello"`, then I create `"hello world"`, then the first `"hello"` should be removed from the memory and utilise the memory of `"hello world"` instead. – Jackson Tale May 31 '12 at 08:37
  • how will you create "hello world" from first "hello" ? – Saurabh May 31 '12 at 08:39
0

Keeping in mind that strings are immutable, and that they take up memory, envision doing several substring operations on a string if each one created a new string! Instead, just create a new String object that points to the same immutable string but has different offset and count properties. Now, no matter how many substrings you do against that original string or substrings of that string there's only one copy of the string itself in memory. Much more efficient.

Also, when doing String s = "Hello, World".substring(0,5); think about the order of operations. First the string "Hello, World" will be created on the heap and a brand new String object will point at it. Then the substring method will be called on the new String object and another new String object created and pointed at by the s instance. So, therefore, s points at the string on the heap "Hello, World" and has an offset of 0 and a count of 5.

Jesse
  • 8,605
  • 7
  • 47
  • 57
mlucas67
  • 26
  • 2