6

I have being trying to understand the concept of String constant pool and inter for last few days, After reading a lot of articles I understood some portions of it, but still confused about few things:-

1.String a = "abc" This creates a object in the String Constant Pool but does the following line of code creates the object "xyz" in String Constant Pool? String b = ("xyz").toLowerCase()

2.

String c = "qwe"   
String d = c.substring(1)    
d.intern()   
String e = "we" 

Should the literal "we" be added to the String consant pool during class loading, if so, why does d==e result in true even when the d is not pointing to String Constant pool

JNK
  • 796
  • 6
  • 12
Arijit Dasgupta
  • 325
  • 3
  • 14
  • You don't need parentheses around `("xyz")` – JonK Oct 29 '15 at 14:20
  • Anywhere in the code you use "" to define a String, you add it to the String Constant Pool, same as if you used `.intern()` on the String, also, the String class is smart enough to realize that "we" is already in the pool, so it checks the pool for "we" when `d.intern()` is called, and sets d to the already existing "we" value – phflack Oct 29 '15 at 14:21
  • 3
    `d==e` will not return `true` unless you set `d = d.intern()` – TheLostMind Oct 29 '15 at 14:22
  • 1
    @VinodMadyalkar `d == e` is `true` because they're both interned values and point to the same String object, I just tested in a test file to make sure – phflack Oct 29 '15 at 14:24
  • @phflack - Nope. `Strings` are immutable. `d.intern()` will not make `d` point to the *interned String* – TheLostMind Oct 29 '15 at 14:25
  • 2
    @VinodMadyalkar I think what is happening is the "we" String from `d = c.substring(1)` is automatically checking the pool and using the interned "we" – phflack Oct 29 '15 at 14:26
  • 1
    @VinodMadyalkar `d==e` results in true!! – Arijit Dasgupta Oct 29 '15 at 14:27
  • As for `"xyz".toLowerCase()` - it does add `"xyz"` to the pool already based on the `"xyz"` literal. `toLowerCase()` actually doesn't do anything interesting here as it checks that the `"xyz"` is already lowercase so it just returns it. – Jiri Tousek Oct 29 '15 at 14:27
  • 1
    @VinodMadyalkar Tested on both Mac version of Java and on compilejava.net (running on Debian I believe), and they both return true – phflack Oct 29 '15 at 14:28
  • @phflack - Ya. I am checking "why" :) – TheLostMind Oct 29 '15 at 14:29
  • @phflack does "xyz" get added during class loading or it is done at runtime after .intern() step? – Arijit Dasgupta Oct 29 '15 at 14:31
  • @ArijitDasgupta - "xyz" gets initialized when the class gets loaded – TheLostMind Oct 29 '15 at 14:33
  • "xyz" should be added as the class is loaded, using "xyz".intern() would first add it to the String pool as the class is loaded, then after it would attempt to add it again, but see there's already "xyz" there and use that – phflack Oct 29 '15 at 14:33
  • Also note that, if we write `d.intern() ` after `String e = "we"` it results to `d==e` results to false – Arijit Dasgupta Oct 29 '15 at 14:39

1 Answers1

8

The string pool is being lazily loaded. If you call intern() yourself before the string literal, then that is the version of the string that will go into the string pool. If you do not call intern() yourself, then the string literal will populate the string pool for us.

The surprising part is that we can influence the string pool ahead of the constant pool; as is demonstrated in the code snippets below.


To understand why the two code snippets have different behaviour, it is important to be clear that

  1. the constant pool is not the same as the string pool. That is, the constant pool is a section of the class file stored on disk and the string pool is a runtime cache populated with strings.

  2. and that referencing a string literal does not reference the constant pool directly it instead as per the Java Language Specification jls-3.10.5; a character literal populates the string pool from the constant pool if and only if there is not already a value within the string pool.

That is to say, that the life cycle of a String object from source file to runtime is as follows:

  1. placed into the constant pool by the compiler at compile time and stored within the generated class file (there is one constant pool per class file)
  2. the constant pools are loaded by the JVM at class load time
  3. the strings created from the constant pool are added to the string pool at runtime as intern is called (if an equivalent string is not already there, if there is a string already there then the one in the string pool will be used) JVM Spec 5.1 - The Run-Time Constant Pool.
  4. intern can happen explicitly by manually calling intern() or implicitly by referencing a string literal such as "abc" jls-3.10.5.

The difference in behaviour between the following two code snippets is caused by calling intern() explicitly before the implicit call to intern via the string literal has occurred.

For clarity, here is a run through of the two behaviours that were discussed in the comments to this answer:

    String c = "qwe";   // string literal qwe goes into runtime cache
    String d = c.substring(1); // runtime string "we" is created
    d.intern();         // intern "we"; it has not been seen 
                        // yet so this version goes into the cache
    String e = "we";    // now we see the string literal, but
                        // a value is already in the cache and so 
                        // the same instance as d is returned 
                        // (see ref below)

    System.out.println( e == d );  // returns true

And here is what happens when we intern after the string literal is used:

    String c = "qwe";   // string literal qwe goes into runtime cache
    String d = c.substring(1); // runtime string "we" is created
    String e = "we";    // now we see the string literal, this time
                        // a value is NOT already in the cache and so 
                        // the string literal creates an object and
                        // places it into the cache
    d.intern();         // has no effect - a value already exists
                        // in the cache, and so it will return e

    System.out.println( e == d );  // returns false
    System.out.println( e == d.intern() );  // returns true
    System.out.println( e == d );  // still returns false

Below are the key part of the JLS, stating that intern is implicitly called for string literals.

Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.

And the JVM spec covers details on the runtime representation of the constant pool loaded from the class file and it interacts with intern.

If the method String.intern has previously been called on an instance of class String containing a sequence of Unicode code points identical to that given by the CONSTANT_String_info structure, then the result of string literal derivation is a reference to that same instance of class String. .

Community
  • 1
  • 1
Chris K
  • 11,622
  • 1
  • 36
  • 49
  • This sounds like a plausible explanation! But I too would like to see a document or source code fragment where one can see that this is what happens. – Lii Oct 29 '15 at 14:43
  • Probably. There is no magic happening in the byte-code either :) – TheLostMind Oct 29 '15 at 14:44
  • But `d.intern()` does not affect the reference d, and d stores the address of the object in heap memory, how does d point to the String constant pool , when I write `d.intern()` – Arijit Dasgupta Oct 29 '15 at 14:45
  • Reference added. The spec describes the structure of the constant pool and states that if intern has already been called, then the reference into the class file will return the same value when resolved. – Chris K Oct 29 '15 at 14:47
  • As a related aside Java 8 has added string de-duplication that has to be enabled, but once turned on can on the fly intern strings concurrently in the background. https://blog.codecentric.de/en/2014/08/string-deduplication-new-feature-java-8-update-20-2/ – Chris K Oct 29 '15 at 14:50
  • @ChrisK - I don't get it. `d.substring()` returns a string on the heap. we are calling `intern()` but not *re-assigning* the reference back to `d`. The reference returned by call to `intern()` is popped from the operand stack of the stack frame. So we have 2 values of "we" one on heap and another on constants pool. Then we have "we". `d= d.intern()== "we"` returning true is OK. but the other case isnt – TheLostMind Oct 29 '15 at 14:51
  • @VinodMadyalkar perhaps http://blog.jamesdbloom.com/JVMInternals.html#string_table will help you.. intern() does a little bit more than that. Specifically, if the string is not already in the interned cache then it will be placed into it. – Chris K Oct 29 '15 at 14:52
  • @VinodMadaylkar Quote from jamesdbloom's blog: When String.intern() is called, if the symbol table already contains the string then a reference to this is returned, if not the string is added to the string table and its reference is returned. – Chris K Oct 29 '15 at 14:55
  • @ChrisK - With all due respect- No the link doesn't answer it. I know the same interned String will be returned when we use `""` for the same String. But my question is - *the returned interned string is not used. So, d should still point to the string on heap and not the one in constants pool.*. `identityhashCode` for d remains the same both before interning and after interning. This is some JVm magic / optimization. – TheLostMind Oct 29 '15 at 14:58
  • @VinodMadyalkar I think I see what you are saying. And that would be true if the JVM was pre loading the constant pool with the string literal exactly when the class was loaded. The link in the answer above shows that if the string is already in the cache before the string literal is resolved, then the version already there will be used. Thus we are seeing a side effect of intern() being called before the string literal is loaded. – Chris K Oct 29 '15 at 15:00
  • @ChrisK then why does `d.intern()` after `String e = we` result in false – Arijit Dasgupta Oct 29 '15 at 15:05
  • 1
    @ArijitDasgupta because in that case the string literal was loaded before the interning, thus it is that version that gets used. Interning does not go back and change the result of calling .substring(1) (aka variable d). It merely decides whether it will cache that value or not, which depends purely on whether it has already seen that value yet or not. – Chris K Oct 29 '15 at 15:08
  • @ArijitDasgupta I have updated the answer to include more detail, does that help to clarify? – Chris K Oct 29 '15 at 15:14
  • @ChrisK - Good explanation.. :) – TheLostMind Oct 29 '15 at 15:18
  • @ChrisK - Excellent Explanantion, But, 1. Why doesn't String e = "we" gets added to the String Consant pool, while class loading. 2. In the first section of your explanation, `d.intern()` adds "we" to String Consant pool, so how does it change the address value stored in d 3. How does `e == d.intern()` in the last section yield true Really sorry if I am repeating my same doubts – Arijit Dasgupta Oct 29 '15 at 15:28
  • 1
    1) it could; it really depends on the implementation of the JVM.. at one stage Sun put a lot of effort into trying to make Java appear snappy to GUI users, that is they made as much of the class loading as lazy as they could to speed up the JVM startup time. I also believe that there is an argument that if the string literal never gets used, why consume memory. – Chris K Oct 29 '15 at 15:30
  • 1
    2) it does not change the java address, because the string is not already in the cache 'this' is returned (aka the reference already in d) – Chris K Oct 29 '15 at 15:32
  • 2) So you mean to say, the String consant pools saves the reference address of d? – Arijit Dasgupta Oct 29 '15 at 15:33
  • 1
    @ArijitDasgupta essentially yes. There may be some munging under the hood, as depending on the version of Java the intern pool has been stored in different places. So the physical address may differ, but the Java level address will remain the same. – Chris K Oct 29 '15 at 15:34
  • @ChrisK You saved my day, You saved my week!! Thanks for the detailed answer!! – Arijit Dasgupta Oct 29 '15 at 15:35
  • 1
    @ArijitDasgupta I did not want to muddy the water until the answer to your question was clear; but now that it is for completeness I must emphasis that .intern() has always been very slow, and not generally encouraged. The new mechanism in Java 8 is the way that the language is going, so if you really need string de-duplication (and most people do) then I strongly recommend that you read into letting the JVM do it for you. – Chris K Oct 29 '15 at 15:37
  • That link again for turning on runtime (background) string de-duplication as part of the GC cycle: https://blog.codecentric.de/en/2014/08/string-deduplication-new-feature-java-8-update-20-2/ – Chris K Oct 29 '15 at 15:39
  • @ArijitDasgupta I have reworded the answer to make it clearer that there is a difference between the Constant Pool and the String Pool. On reflection I think that the cause of our confusion has been that as Java developers we often end up thinking of them as being the same thing and assume that they both occur fully as part of the class loading. – Chris K Oct 29 '15 at 16:32
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/93712/discussion-between-arijit-dasgupta-and-chris-k). – Arijit Dasgupta Oct 29 '15 at 16:35
  • @khelwood details and references added; let me know if anything is not clear or backed up properly – Chris K Oct 29 '15 at 17:21
  • @lii details and references added; let me know if anything is not clear or backed up properly – Chris K Oct 29 '15 at 17:21
  • @ChrisK: This is beautiful knowledge! Thank you! – Lii Oct 30 '15 at 07:00