This behavior is because of interning. The behavior is described in the docs for String#intern
(including why it's showing up in your code even though you never call String#intern
):
A pool of strings, initially empty, is maintained privately by the class String
.
When the intern
method is invoked, if the pool already contains a string equal to this String
object as determined by the equals(Object)
method, then the string from the pool is returned. Otherwise, this String
object is added to the pool and a reference to this String
object is returned.
It follows that for any two strings s
and t
, s.intern() == t.intern()
is true
if and only if s.equals(t)
is true.
All literal strings and string-valued constant expressions are interned. String literals are defined in §3.10.5 of the Java Language Specification.
So for example:
public class Test {
private String s1 = "Hi";
public static void main(String [] args) {
new Test().test();
System.exit(0);
}
public void test() {
String s2 ="Hi";
String s3;
System.out.println("[statics] s2 == s1? " + (s2 == s1));
s3 = "H" + part2();
System.out.println("[before interning] s3 == s1? " + (s3 == s1));
s3 = s3.intern();
System.out.println("[after interning] s3 == s1? " + (s3 == s1));
System.exit(0);
}
protected String part2() {
return "i";
}
}
Output:
[statics] s2 == s1? true
[before interning] s3 == s1? false
[after interning] s3 == s1? true
Walking through that:
- The literal assigned to
s1
is automatically interned, so s1
ends up referring to a string in the pool.
- The literal assigned to
s2
is also auto-interned, and so s2
ends up pointing to the same instance s1
points to. This is fine even though the two bits of code may be completely unknown to each other, because Java's String
instances are immutable. You can't change them. You can use methods like toLowerCase
to get back a new string with changes, but the original you called toLowerCase
(etc.) on remains unchanged. So they can safely be shared amongst unrelated code.
- We create a new
String
instance via a runtime operation. Even though the new instance has the same sequence of characters as the interned one, it's a separate instance. The runtime doesn't intern dynamically-created strings automatically, because there's a cost involved: The work of finding the string in the pool. (Whereas when compiling, the compiler can take that cost onto itself.) So now we have two instances, the one s1
and s2
point to, and the one s3
points to. So the code shows that s3 != s1
.
- Then we explicitly intern
s3
. Perhaps it's a large string we're planning to hold onto for a long time, and we think it's likely that it's going to be duplicated in other places. So we accept the work of interning it in return for the potential memory savings. Since interning by definition means we may get back a new reference, we assign the result back to s3
.
- And we can see that indeed,
s3
now points to the same instance s1
and s2
point to.