2

I understand that inside a method:

String myStr1 = "good";
String myStr2 = "good";
System.out.println(myStr1==myStr2);

Prints true. For the same reason:

String myStr1 = "good";
String myStr2 = ""+'g'+'o'+'o'+'d';
System.out.println(myStr1==myStr2);

Prints also true.

Then why:

String myStr1 = "good";
char[] myCharArr = {'g', 'o', 'o', 'd' };
String myStr2 = ""+myCharArr[0]+myCharArr[1]+myCharArr[2]+myCharArr[3];
System.out.println(myStr1==myStr2);

Prints false? I don't see the difference between the two last codes. Any idea? Thanks.

daniel sp
  • 937
  • 1
  • 11
  • 29
  • 1
    The first two examples are concatenated at compile time, and resolve to the same object. The last example is concatenated at run time, and usually creates a new unique String object. – Nayuki Nov 29 '15 at 20:36
  • I see the point. Tks. – daniel sp Nov 29 '15 at 20:40
  • 1
    http://stackoverflow.com/questions/513832/how-do-i-compare-strings-in-java is a nice page to bookmark for this :) – Naman Nov 29 '15 at 20:45
  • I'm guessing you're aware that Strings are objects and value comparison of them requires **myStr1.equals(myStr2)**. **==** does reference comparison on objects and I'm guessing your question is about how Java decides to use the same objects or not at compile time. – Alain O'Dea Nov 29 '15 at 20:49
  • Hi Alain. Your first guessing is right; I'm well aware of it. For second gessing I had not considered that using the same string object for different references had to be done in the compilation, thus the third code would always generate a new object as it's generated during runtime. It's a doubt that arose after a question while preparing for the OCA test. – daniel sp Nov 30 '15 at 12:34

4 Answers4

3

myCharArr[0] can't be evaluated at compilation time since compiler (who's cleverness is limiter) thinks that it may be possible that at runtime before string will be concatenated this array may be edited (maybe by some other thread) which means its content can change so it doesn't assume that for instance myCharArr[0] should be 'g' (maybe in the future this behavior will be improved).

So while with code like ""+'g'+'o'+'o'+'d' compiler is sure about values it handles, it can figure out that result string will be "good" (since we used compile-time-constants) so to optimize our code and preventing recalculating this expression each time we run our code it simply replaces ""+'g'+'o'+'o'+'d' with "good".
But since it can't evaluate this expression for myCharArr[0] it can't optimize our code same way, which means it will need to leave creation of this string to code executed at runtime.


Now If you are wondering why == returns true for "good"=="good" but false for code like "good"==new String("good") you need to know that:

  • == compares references, in other words it lets us test if we are comparing references storing same objects (if you want to check if objects are equal use equal method)
  • Java has String Pool which stores literals to avoid recreating many String objects storing same data and compiler adds code responsible for placing and retrieving literal from that pool, so when you do "good"=="good" both literals are same object from that pool which true confirms
  • but compiler doesn't add code responsible for placing into pool or retrieving from it String created at runtime explicitly by using new Sring(data) constructor to prevent in pool strings which most probably will not be rereated ever again so with "good"==new String("good") you are comparing two different objects, "good" from pool, and new String(...) which is separate than one from pool (which confirms result false of ==).
Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • 1
    This is a little hand-wavy without a reference to constant expressions and the language specification. It isn't based on the possibility of arrays changing. Technically speaking static analysis could rule that out, so it really is the JLS definition of Constant Expressions that matters here. – Alain O'Dea Nov 29 '15 at 21:23
  • 1
    "*static analysis could rule that out, so it really is the JLS definition of Constant Expressions that matters here*" that is very true. Java compiler already handles nicely concatenation of compilation-constants stored in final variables so we may hope for other compiler improvements in the future. Anyway updated my answer a little. I wasn't trying to create answer based on "because JLS says so". I tried to focus on possible problem/reason which could show why JLS says so. – Pshemo Nov 29 '15 at 21:50
  • that makes sense. This is one area where a future language spec could expand the definition of Constant Expression. That will expose some fun bugs when it happens ;) – Alain O'Dea Nov 29 '15 at 21:55
3

The compiler replaces multiple value-equal Strings built from Constant Expressions like this:

String myStr1 = "good";
String myStr2 = ""+'g'+'o'+'o'+'d';
System.out.println(myStr1==myStr2);

With a unique String object obtained from String.intern. That unique String object is then assigned to both variables. This is why they are then reference equal.

String myStr1 = "good";
char[] myCharArr = {'g', 'o', 'o', 'd' };
String myStr2 = ""+myCharArr[0]+myCharArr[1]+myCharArr[2]+myCharArr[3];
System.out.println(myStr1==myStr2);

The compiler cannot optimize this because it has an array reference which is not a Constant Expression. This results in two separate String objects which are not reference equal. It would violate the Java Language Specification to do otherwise.

Here is the definition of a Constant Expression from the Java Language Specification:

A constant expression is an expression denoting a value of primitive type or a String that does not complete abruptly and is composed using only the following:

  • Literals of primitive type and literals of type String (§3.10.1, §3.10.2, §3.10.3, §3.10.4, §3.10.5)

  • Casts to primitive types and casts to type String (§15.16)

  • The unary operators +, -, ~, and ! (but not ++ or --) (§15.15.3, §15.15.4, §15.15.5, §15.15.6)

  • The multiplicative operators *, /, and % (§15.17)

  • The additive operators + and - (§15.18)

  • The shift operators <<, >>, and >>> (§15.19)

  • The relational operators <, <=, >, and >= (but not instanceof) (§15.20)

  • The equality operators == and != (§15.21)

  • The bitwise and logical operators &, ^, and | (§15.22)

  • The conditional-and operator && and the conditional-or operator || (§15.23, §15.24)

  • The ternary conditional operator ? : (§15.25)

  • Parenthesized expressions (§15.8.5) whose contained expression is a constant expression.

  • Simple names (§6.5.6.1) that refer to constant variables (§4.12.4).

  • Qualified names (§6.5.6.2) of the form TypeName . Identifier that refer to constant variables (§4.12.4).

Constant expressions of type String are always "interned" so as to share unique instances, using the method String.intern.

A constant expression is always treated as FP-strict (§15.4), even if it occurs in a context where a non-constant expression would not be considered to be FP-strict.

SOURCE: http://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#d5e30892

Alain O'Dea
  • 21,033
  • 1
  • 58
  • 84
  • 1
    Note also that even if myCharArray were declared `final`, it would still not be a constant expression. – Klitos Kyriacou Nov 29 '15 at 22:57
  • @KlitosKyriacou yes, very important. **Constant Expression** as I've used it refers to a specific and constrained definition in the language specification not all semantically constant expressions :) – Alain O'Dea Nov 30 '15 at 12:29
1

The following statement

String myStr2 = ""+myCharArr[0]+myCharArr[1]+myCharArr[2]+myCharArr[3];

Will be compiled to the following:

  1. StringBuilder sb = new StringBuilder()
  2. sb.append(myCharArr[0]) ... sb.append(myCharArr[3])
  3. and then calls sb.toString() which returns a new String

Decompile the byte-code and you will see something like this

  28: invokespecial #3                  // Method java/lang/StringBuilder."<init>":()V
  31: ldc           #4                  // String
  33: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
  36: aload_1
  37: iconst_0
  38: caload
  39: invokevirtual #6                  // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
  42: aload_1
  43: iconst_1
  44: caload
  45: invokevirtual #6                  // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
  48: aload_1
  49: iconst_2
  50: caload
  51: invokevirtual #6                  // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
  54: aload_1
  55: iconst_3
  56: caload
  57: invokevirtual #6                  // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
  60: invokevirtual #7                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;

Where the other statements

String myStr1 = "good";
String myStr2 = ""+'g'+'o'+'o'+'d';

Declares two constant strings, and here's the byte-code

   0: ldc           #2                  // String good
   2: astore_1
   3: ldc           #2                  // String good
   5: astore_2

The compiler declares them as constants straight away.

Sleiman Jneidi
  • 22,907
  • 14
  • 56
  • 77
1

Only compile-time constant Strings are automatically interned. What is considered a constant string is described (in general for constant expressions) in the Oracle documentation. By that definition, your char array is not constant, and therefore the expression that uses it will create a new String object.

Klitos Kyriacou
  • 10,634
  • 2
  • 38
  • 70