-1

The following code confuses me, could anyone explain why the two tests behave differently? Why does the String comparison in the first test return false while the comparison in the second test returns true?

 public class Student {

/**
 *  Why the string "java" doesn't added to the 'String Pool' by intern() method ?
 */
@Test
public void test1() {
    String str1 = new String("ja") + new String("va");
    str1.intern();
    String str2 = "java";
    // Result:false
    System.out.println("Result:" + (str1 == str2));
}

/**
 *  Any other strings will be added to 'String Pool' as expected after intern() is invoked.
 */
@Test
public void test2() {
    String str1 = new String("ja1") + new String("va");
    str1.intern();
    String str2 = "ja1va";
    // Result:true
    System.out.println("Result:" + (str1 == str2));
}
rellocs wood
  • 1,381
  • 4
  • 21
  • 36

2 Answers2

16

You're basically checking whether a string was already in the string pool. The string "java" isn't added to the pool by calling intern in your first piece of code because it's already in the string pool. In each method, your code:

  • Creates a new string
  • Calls intern on the newly created string (but ignores the result; almost always a bad idea, and you can detect the existence of a previous value in the string pool easily by using the return value)
  • Compares the new string with a string literal, which will always use the result that's now in the string pool

Now the call to intern will add the target string to the pool if it doesn't already exist, so your comparison will return true if and only if the new string value was not previously in the string pool. This is equivalent to testing whether intern returns a different reference to the target of the call.

For any given string reference, there are three possibilities:

  • That exact reference is present in the string pool already. (That can't be the case in your code, because you're creating a new string.)
  • A reference to an equal string is present in the string pool. In that case, intern() will return the existing reference.
  • No equal string is present in the string pool. In that case, the target of the call will be added to the string pool, and the same reference returned.

What you're seeing is the result of other code putting things in the string pool - quite possibly as part of loading classes. Here's an example to demonstrate that:

public class Test {
    public static void main(String... args) {
        checkInterned("ja", "va");
        checkInterned("ja", "va.lang");
        checkInterned("ja", "va.other");
        checkInterned("Int", "eger");
        checkInterned("abc", "def");
        checkInterned("Te", "st");
        checkInterned("Te", "st2");
        checkInterned("check", "Interned");
        checkInterned("check", "Interned2");
    }

    public static void checkInterned(String start, String end) {
        String x = start + end;
        String y = x.intern();
        System.out.println(x + " was interned already? " + (x != y));
    }
}

Output:

java was interned already? true
java.lang was interned already? true
java.other was interned already? false
Integer was interned already? true
abcdef was interned already? false
Test was interned already? true
Test2 was interned already? false
checkInterned was interned already? true
checkInterned2 was interned already? false

So the interned values are:

java
java.lang
Integer
Test
checkInterned

They're all names that would naturally come up when loading classes (including the one being run).

I suspect that "java" is only a special case here in that there may well be lots of code within the JRE that checks whether a string starts with "java" as a reserved name.

This doesn't indicate anything about "java" being a keyword though - it's just "a string that's already in the string pool". You don't need to treat it any differently.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Great answer, good explanation and assumptions are clearly stated as such. – Robert Aug 01 '18 at 06:22
  • 1
    It should be noted that this is hitting implementation specific behavior. If the string in not already in the pool, `intern()` may add the object you’re invoking the method on and return it, but it may also create a new string to be added to the pool and return that new string. Older Java implementations did this. – Holger Aug 01 '18 at 12:19
  • @Holger: That's interesting - the [docs](https://docs.oracle.com/javase/10/docs/api/java/lang/String.html#intern()) say: "When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned." That suggests it should *not* create a new string. The part about when literals are added to the intern pool *is* implementation-specific though. – Jon Skeet Aug 01 '18 at 12:54
  • 1
    Well, for the caller, it makes no difference. If a new string is returned, it still is a string existing in the pool (prior to returning from the `intern()` call). Afaik, it changed with Java 7, update 6. Prior to this version, `String` had the fields `offset` and `length` and could be a substring of a much larger string, so creating a new string prevented referencing too large arrays from the pool. But this copying was unconditional, as up to and including Java 6, the copy was made to the PermGen. In Java 7, it was made in the ordinary heap and with update 6, no copy was made anymore. – Holger Aug 01 '18 at 13:05
  • @Holger: While I'd agree that it would be odd application code to rely on this, if a current implementation returned a new string having added it to the pool, I think that would be violating the contract. I note that the Java 6 docs have the same wording. (It would have been reasonable for this to have changed in the docs between Java 7 and Java 8, for example.) It feels odd that the docs explicitly state something they don't need to, when it was inaccurate... – Jon Skeet Aug 01 '18 at 13:12
9

The first thing to realize is that str1.intern() doesn't change the str1 reference. It returns the interned reference. So if you wanted str1 to now be that reference, you'd have to do:

str1 = str1.intern();

So, why the difference? In a nutshell, because the JVM already has a string "java" in its thread pool, because of various internals.

In the first example, str1 starts off as a newly instantiated String (as I think you understand). You then call str1.intern(), which returns the interned reference of a pre-existing String "java", but you don't do anything with that reference. When you then compare str1 == "java", you're comparing the reference to the newly instantiated object with the reference to the interned object, and get false.

In the second example, "ja1va" does not exist in the string pool to start off. When you call str1.intern(), that method puts "ja1va" into the pool, with its current reference (that is, str1) as the canonical reference. When you subsequently refer to the "ja1va" literal string, the JVM looks to see whether it's already in the pool, sees that it is, and uses it. Thus, you get true.

In other words, in the first case, you're creating a new String object and then not actually grabbing its interned equivalent. In the second case, you're creating a new String object, defining that as the interned reference, and then reloading it via a string literal.

yshavit
  • 42,327
  • 7
  • 87
  • 124