2

check the following program: Run it in sun java hostspot jvm, everything will be "true".

--------updated: got the answer by Stephen and Danie,changed the program to add string intern method-----------

how it will become, if B is separate compiled not together with A, what will happen???, for example , B is compiled and put in a jar, and put its class path when run TestStringEqual ??

Also, is this java compile time optimization, or java run time optimization, or java language specification defined ??

Also, it this program comes the same result on different VMs, or just one VM feature?

thanks

public class TestStringEqual {
public static String HELLO = "hello";

private String m_hello;

public TestStringEqual() {
    m_hello = "hello";
}

public static void main(String[] args) {
    String a = "hello";
    String b = "hello";

    System.out.println("string a== string b:" + (a == b));

    System.out.println("static memebr ==a:" + (HELLO == a));

    System.out.println("instance field ==a:"
            + (new TestStringEqual().getHello() == a));

    System.out.println("hello in B ==a:" + (B.B_HELLO == a));

    System.out.println("interned new string object in heep==a:"
            + ( new String("hello").intern() == a));

}

public String getHello() {
    return this.m_hello;
}
}
class B{
public static final String B_HELLO = "he"+"llo";
}
Ben Xu
  • 1,279
  • 4
  • 13
  • 26

5 Answers5

4

There is really no mystery about this at all. You just need to know three basic facts about Java:

  • The '==' operator for object references tests if two object references are the same; i.e if they point to the same object. Reference JLS 15.21.3

  • All String literals with the same sequence of characters in a Java program will be represented by the same String object. Reference JLS 3.10.5 So (for example) "hello" == "hello" is comparing the same object.

  • Constant expressions are evaluated at compile time. Reference JLS 15.28. So (for example) "hell" + "o" is evaluated at compile time, and is therefore equivalent to the literal "hello".

These three facts are stated in the Java Language Specifications. They are sufficient to explain the "puzzling" aspects behaviour of your program, without relying on anything else.

The more detailed explanation involving the string pool, string literals being interned by the class loader, the bytecodes emitted by the compiler, etc, etc ... are just implementation details. You don't need to understand these details if you understand what the JLS is saying, and they don't really help to make the JLS clearer (IMO).


Notes:

  1. The definition of what is and what isn't a constant expression is a little involved. Some things that you might imagine to be constant valued, are in fact not. For instance, "hello".length() is not a constant expression. However, a concatenation of two string literals is a constant expression.

  2. The explanation of equality of string literals in the JLS does in fact mention interning as the mechanism by which this property of literals is implemented.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Thanks, Stephen, it helps a lot – Ben Xu Jun 26 '11 at 07:06
  • about the second one "All String literals with the same sequence of characters in a Java program will be represented by the same String object. Reference JLS 3.10.5 So (for example) "hello" == "hello" is comparing the same object.". So it apply not only one class, but also for string literals in different class, even different jar file, or even loaded by different class loader? – Ben Xu Jun 26 '11 at 07:10
  • It applies to literals in all classes. The example in the JLS makes this clear. – Stephen C Jun 26 '11 at 07:26
  • @Ben being a `java.*` class, `String` will only be loaded by the system class loader; no other class loader can override that. So there is really only one `String` class in the JVM. Otherwise it would be really confusing. – irreputable Jun 26 '11 at 07:42
  • already know these boot, extension, application also self defined classloader. anyway ,thanks irreputable, this is also another good point to understand the whole thing. – Ben Xu Jun 26 '11 at 07:46
3

On the JVM level, the LDC (load constant) instruction is used to push a string literal onto the stack. For performance reasons, the string literal isn't stored in the code itself; it's stored in the constant pool of the class. The constant pool is a table which appears at the beginning of a class file containing string literals, numeric literals, field and method descriptors, and a few other things. LDC is followed by a byte specifying the string's index in the constant pool. (If one byte is not large enough, the compiler will use LDC_W, which is followed by a 16-bit offset. Hence the limit of 65,536 constants.)

If the same string literal occurs twice in the same class, javac is smart enough to create only one entry in the constant pool. When a class is loaded, the JVM creates actual String objects from the data in the constant pool. LDCs which contain the same offset into the constant pool will thus cause the same String to be pushed onto the stack. Instructions like IF_ACMPEQ (which checks for reference equality as == does) will then recognize the strings as identical.

See the JVMS for more info.

Daniel Lubarov
  • 7,796
  • 1
  • 37
  • 56
  • this helps a lot. So is it true that at compile time, if the same string are in the same file, the generated class file already point to one. And at JVM level, when loading the class file, if multi same string are in the class constant pool, they are still same string because run time created the same string obejct? But how it explain "hello" == "hell"+"o" ?? the + operator should be at run time???? – Ben Xu Jun 26 '11 at 06:07
  • I believe it is not technically guaranteed that `"hello" == "hell"+"o"`, but in practice javac will just do the concatenation at compile time before generating the constant pool. – Daniel Lubarov Jun 26 '11 at 06:16
  • @Daniel - it is guaranteed by the JLS. See my answer. – Stephen C Jun 26 '11 at 06:46
  • @Daniel, this explain the multi "hello" in one class, but not in different class. – Ben Xu Jun 26 '11 at 06:54
  • @Stephen, thanks, I wasn't aware. @Ben, the [section Stephen cited](http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.5) says string literals are just [intern](http://download.oracle.com/javase/6/docs/api/java/lang/String.html#intern%28%29)ed. – Daniel Lubarov Jun 26 '11 at 07:12
  • thanks @Danie & @Stephen, so " a pool of strings, initially empty, is maintained privately by the class String". It means at run time, the "String" class maintains a pool of strings literals. So at run time, any string literals that are expressed the same at program will have only one runtime string in the pool. – Ben Xu Jun 26 '11 at 07:24
  • changed the program to add: System.out.println("interned new string object in heep==a:" + ( new String("hello").intern() == a)); – Ben Xu Jun 26 '11 at 07:38
1

It's an immutable string (unable to be mutated or changed), not an immune one, though I suppose you could argue that it's immune from change :-)

That means you cannot change the underlying string itself, you can only assign a different string to the variable. So:

string a = "Hello";
a = "Goodbye";

doesn't change the memory where "Hello" is stored, it changes a to point to a different memory location where "Goodbye" is stored.

This allows Java to share strings for efficiency. You can even get cases where strings like "deoxyribonucleic acid" and "acid" may share space, where the latter points to a specific location within the former. Again, this is made possible by the immutable nature of such strings.

In any case, == will check to see if the strings refer to the same underlying object, not something that's often useful. If you want to see if the strings are equal, you should be using String.equals() or one of its variations.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
0

It is fairly simple: the compiler will generate a (bytecode) constant for the string "hello" the first time it encounters it. In normal assembler it would be in the .TEXT section.

The subsequent "hello" strings will then point to that same constant, since there is no need to allocate new space or create a new constant. The reason this is so is because strings are immutable and if one is assigned a new value new memory is needed for it anyway.

It will probably not work on input, i.e. if you let a user input "hello" and ==-compare that to the compile-time hello strings you'll likely get false.

dtech
  • 13,741
  • 11
  • 48
  • 73
  • In addition to user input, a simple `a = "hello".toUpperCase().toLowerCase()` will show that the identity equality works only because of the string pool and how classes are loaded. – Dilum Ranatunga Jun 26 '11 at 05:48
  • To add to what @dtech said, look up Constant Pool for Java bytecode: http://en.wikipedia.org/wiki/Class_(file_format)#The_constant_pool – Dilum Ranatunga Jun 26 '11 at 05:50
  • So it is at compile time optimization, instead of jvm run time? one question, if the two class is compiled at different time, in the following example: TestStringEqual and B, then it will not equal?? I will try on my eclipse – Ben Xu Jun 26 '11 at 05:57
  • @Ben Xu yes I believe so. The compiler will create a different literal for both classes so when ==-comparing them they are different. – dtech Jun 26 '11 at 06:10
  • @dtech - your last comment is incorrect. Refer to my answer, specifically the reference on String literals. The JLS says that string literals are intern'ed, and hence that they are `==` irrespective of which class they come from. – Stephen C Jun 26 '11 at 13:30
0

As far as a==b goes, it seems the compiler is making the shortcuts and sharing the same string object. When I declare my varuiables as follows, I get a==b is false.

String a = "hello";
String b = "hell";
String temp = "o";
if (new java.util.Random().nextDouble() < 0.5) b += temp;
else b += "o";

If I do String b = "hell"+"o"; I still get a==b as true.

BudgieInWA
  • 2,184
  • 1
  • 17
  • 31
  • That is because the compiler is smart enough to evaluate expressions that are constant at compile-time, such as 2*8 or "hell"+"o" – dtech Jun 26 '11 at 06:13
  • @dtech, exactly my point. At least for `javac`, the compiler is responsible for most of the optimisations evident in this code. – BudgieInWA Jun 26 '11 at 06:17
  • It even evaluates things like `3 << 4 ^ (int) ((float) 25 / (double) 10)` at compile time. You can use `javap -c MyClass` to see exactly what code it's generating. – Daniel Lubarov Jun 26 '11 at 06:26
  • 1
    @BudgieInWA - this is technically not an optimization. It is an implementation of what the JLS says **must** happen. The compiler writer has no choice ... if he / she wants to claim Java compliance. – Stephen C Jun 26 '11 at 06:50
  • hi, Stephen, can you give some link to the "java language specification" on this point? – Ben Xu Jun 26 '11 at 06:57
  • @Ben Xu - my answer links to the relevant sections. – Stephen C Jun 26 '11 at 13:25