0

Consider following snippet.

CASE #1

public class HelloWorld {
    public static void main(String[] args) {
        String str1 = "abc";
        String str2 = "ab";

        str2 = str2 + "c";

        System.out.println("str1 :" + str1+ ", str2 :" + str2);

        System.out.println(str1 == str2);
    }
}

The result is

 sh-4.3$ java -Xmx128M -Xms16M HelloWorld                                                                                                                                        
 str1 :abc, str2 :abc                                                                                                                                                            
 false

Here, the result of str1 == str2 comes out to be false. However, if you use "+" operator to concatenate two literals. It gives you the address of the string literal "abc" from string constant pool. Consider following snippet

CASE #2

public class HelloWorld {
    public static void main(String[] args) {
        String str1 = "abc";
        //String str2 = "ab";

        str2 = "ab" + "c";

        System.out.println("str1 :" + str1 + ", str2 :" + str2);

        System.out.println(str1 == str2);
    }
}

The result is

  sh-4.3$ java -Xmx128M -Xms16M HelloWorld                                                                                                                                        
  str1 :abc, str2 :abc                                                                                                                                                            
  true 

Can someone please explain why string interning is done in CASE #2 and not in CASE #1? Why do we get 'str1==str2' as false in CASE #1 and true in CASE #2?

pushkin
  • 9,575
  • 15
  • 51
  • 95
Vinit Gaikwad
  • 329
  • 9
  • 21
  • 1
    Examine the byte code with `javap -v`; you'll see why. – Elliott Frisch Jan 22 '16 at 00:38
  • There is a cost to interning. You probably don't want it for non-constant strings by default (and in the first case, there is a part that looks variable at first glance -- i.e. without potentially involved static code analysis). – Thilo Jan 22 '16 at 00:54

4 Answers4

3

Because the JLS #3.10.5 specifies compile-time interning of string literals or constant string expressions, and doesn't specify any interning in the case of non-constant string expressions.

Also specified in JLS #15.28.

user207421
  • 305,947
  • 44
  • 307
  • 483
1

You'll notice because of the Final, the compiler considers this a constant expression and interns it in this case as well.

public class HelloWorld {
  public static void main(String []args) {
    String str1 = "abc";
    final String str2 = "ab";

    String str3 = str2 + "c";

    System.out.println("str1 :" +str1+", str3 :"+str3);

    System.out.println(str1 == str3);
  }
}
Will Hartung
  • 115,893
  • 19
  • 128
  • 203
1

The crucial factor isn't that it's a string literal, it's that it's a constant expression. This is defined in JLS 15.28, which lists all of the ways you can have a constant expression. That list includes "literals of type String" and concatenations of two String constants*, but not of non-final variables, even if those variables happen to be set and never changed.

JLS 15.28 is also what specifies that "Constant expressions of type String are always 'interned'", so if something is not a constant expression -- for instance, if it includes non-final variables -- then it won't be interned.


* This is expressed slightly awkwardly, but basically 15.28 says that an expression is constant if it only consists of a bunch of things, and one of those things is the additive operator +, which for Strings performs concatenation -- there's not actually a separate "concatenationi operator."

yshavit
  • 42,327
  • 7
  • 87
  • 124
  • The crucial factor is not the constant expression but the rule in [JLS #3.10.5](https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.10.5) that says what to do with it. – user207421 Jan 22 '16 at 01:21
  • @EJP if you do `final String s1 = "foo" ; String s2 = s1 + "bar"; s2 == "foobar"` then you'll get `true`, even though s2 consisted of a non-literal. – yshavit Jan 22 '16 at 01:23
  • I agree. I haven't stated otherwise. – user207421 Jan 22 '16 at 01:26
  • @EJP And JLS 15.28 also says what to do with it. In fact, JLS 3.10.5's mention of string literals explicitly says that it's a special case of 15.28. – yshavit Jan 22 '16 at 01:27
  • I agree with the first part. Again, I haven't stated otherwise. But there is nothing in 3.10.5 about special cases for interning. – user207421 Jan 22 '16 at 01:27
  • @EJP Ok, then I guess the disagreement is on whether the crucial aspect is the general rule or its specialization :) – yshavit Jan 22 '16 at 01:28
  • 1
    Now that you've provided the rule about interning, there is little disagreement here, apart from this non-existent business about specialization. – user207421 Jan 22 '16 at 01:29
0

Several of the existing answers here already cover, in a strict sense, why the interning doesn't happen: it isn't required by the JLS.

The JLS specicies both that interning must happen in the "literal" + "literal" case (per 15.18.1, and also that non-constant expressions must be newly created (also in 15.18.1). The relevant section on newly created objects seems to leave open the possibility of the compiler or runtime interning the non-literal case as well (quotation marks added):

Execution of a string concatenation operator (§15.18.1) that is not part
of a constant expression (§15.28) "sometimes" creates a new String object
to represent the result. String concatenation operators may also create
temporary wrapper objects for a value of a primitive type. 

Here, I believe the "sometimes" creates part refers to the fact that the compiler (or runtime) can optmize away a series of concatentation operations within a single expression, such as foo + 1 + anotherFoo + bar() to newly create only a single string for the final result - but it does not allow a compiler or runtime to intern the final result.

One practical reason it would not be a good idea to intern all strings is that interning is a relatively expensive, global operation. Interned strings must be recorded, checked for uniqueness, and all that must be safe across threads which may involve a lock or some faster but still slow lock-free logic. In fact, in large applications, String.intern() can be a significant contention point if the size of the default intern table is not increased (it seems locking occurs per-bucket in the table).

Once the requirement was written in this way, it's essentially impossible for future specifications to loosen it, since existing code may easy depend on objects being newly created: the distinction is readily evident not only via operator == but also when locking an object, using identity hash codes, etc.

For a counterpoint, consider the behavior of autoboxed integers, as well as the related static methods such as Integer.valueOf():

This method will always cache values in the range -128 to 127,
inclusive, and may cache other values outside of this range.

The caching mentioned here is essentially equivalent to interning. In this case, the JLS left open the possibility of interning integers outside the range -128 to 127 - i.e., left it up to the implementation. Indeed, this makes the use of valueOf less predicable for the developer, but gives more flexibility to the implementation, and this flexibility was in fact used in JDK 7 to increase the size of the cached integer range - and even to make it configurable by the end user who launches the JVM.

Community
  • 1
  • 1
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386