15

I am really confused with how string interning works in Java. When I write:

String a = "ABC";
String b = "ABC";

if (a==b)
    System.out.println("Equal");

Does the compiler store the string literal "ABC" into the string constant pool at compile time?

That sounds illogical, because I thought the string constant pool was created by the JVM at runtime, and I don't see how that is possible if it is done at compile time since the Java compiler does not even invoke the JVM.

If it is not done at compile time and it is done at runtime then why does the following return false (taken from this answer)?

// But .substring() is invoked at runtime, generating distinct objects
"test" == "!test".substring(1) // --> false

If it is done at runtime then why can't the JVM figure out that they are the same string?

I am really confused as to how string interning works in Java and where exactly the Java string pool is stored.

Community
  • 1
  • 1
Kramer786
  • 1,238
  • 1
  • 12
  • 26
  • The following link is insightful: http://stackoverflow.com/questions/513832/how-do-i-compare-strings-in-java – Zyn Apr 26 '15 at 14:59
  • I'm not sure about the first part of your question, but `substring` returns a *new* object, and so while the content of both strings match to `test`, since they are not the same object `==` returns false. – Ori Lentz Apr 26 '15 at 14:59
  • 2
    It's the compiler. I don't understand why you say that it cannot both put the strings in the string constant pool. The compiler will put the constant strings inside some memory location, then when launching the program the JVM will load that memory and create the constant pool to be used at runtime, but the pool is setup by the compiler. – Bakuriu Apr 26 '15 at 14:59
  • How is this not a duplicate more than 6 years after Stack Overflow was launched? – Peter Mortensen Apr 26 '15 at 15:52

2 Answers2

19

The compiler puts the literal strings in the class file (and only unique ones, it consolidates all equivalent literals); the JVM loads those strings into the string pool when the class file is loaded.

If it is done at runtime then why can't the JVM figure out that they are the same String.

Because the string being returned by .substring has not been interned, and so is a different object than the equivalent "test" string in the string pool. If you interned it, you'd get true:

"test" == "!test".substring(1).intern() // true

Sections §4.4 of the JLS and §5.3 of the JVM spec look relevant.


Just to be clear: The correct way to compare strings in Java is to use the .equals method or similar, not ==. Using == with string instances is usually incorrect. (Unless you're playing with understanding when and how things are interned...)

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • 1
    The documentation for substring @ http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#substring(int) specifically states it returns a *new* string object. – Patrick White Apr 26 '15 at 15:01
  • Just adding some detail. Nice answer. – Patrick White Apr 26 '15 at 15:02
  • @T.J.Crowder so `intern()` avoids new Object right ? – Ankur Anand Apr 26 '15 at 15:02
  • @AnkurAnand: Not unless the JVM's JIT is being clever, but it avoids *using* it for long (if you use `intern`'s return value). `.substring` will return a new object, and then `intern` will either add that object to the string pool (if an equivalent string isn't already there) and return the same reference, or will locate the existing string in the pool that's equivalent and return that reference instead of the one you called it on. [More in the documentation.](http://docs.oracle.com/javase/8/docs/api/java/lang/String.html#intern--) – T.J. Crowder Apr 26 '15 at 15:04
  • Can anyone think why `substring` doesn't call `intern` itself? I can read from the documentation that it returns a new `String` but I'm not sure of the motivation. – Robert Bain Apr 26 '15 at 15:06
  • 4
    @RobertBain: `intern` isn't free, it comes with a cost. The caller may not A) Want the performance overhead, and/or B) Not want the string returned to be in the pool. I certainly don't want to look things up in a table every time I create a substring, nor put every single transient substring I create put in the pool. – T.J. Crowder Apr 26 '15 at 15:09
  • @T.J.Crowder Just to go a little more in depth. Whats the name of the section where in the .class files where the String literals are stored. If you can link to Oracle Documentation that would be great. **And are all string literals interned ?** Isn't there a memory constraint on the String Pool ? **If its full then are some String literals also stored in the heap then ?** I mean if the pool is full ? Then theoretically could there be a case where `a==b` returns false ? Since the string literal pool maybe full ? – Kramer786 Apr 26 '15 at 15:39
  • @Kramer786: I haven't been into it in that kind of detail. Looking at the JLS, looks like it's probably [the constant pool](https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.4). Re the string pool: Yes, it has a specific capacity, and when that's exceeded old entries may be released from the pool. Doesn't affect any references to those strings that code has; the strings are in the heap anyway. *Edit*: Yeah, that's it, and then [this section of the JVM spec](https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-5.html#jvms-5.3) picks up from there. – T.J. Crowder Apr 26 '15 at 15:42
  • @T.J.Crowder The Constant Pool store the references to the Strings or the Strings itself ? I mean, referring to your last line, are all the Strings in the heap ? Even the ones that are interned ? Correct me if I am wrong, then interning a new String involves going to Constant Pool getting all the references of the Strings there and checking the value of each from the heap, and if none matches store the reference in the Constant Pool. Would that be the general idea of how it works ? – Kramer786 Apr 26 '15 at 15:55
  • 1
    @Kramer786: The pool stores references to strings, which are normal objects stored in the usual way. The purpose of the pool is purely to avoid having multiple copies of equivalent strings, it doesn't put them elsewhere in memory. Re how it finds them, implementation details are just that and thus can vary from JVM to JVM, but it probably uses a B-tree or hashing mechanism or similar. – T.J. Crowder Apr 26 '15 at 16:00
  • @T.J.Crowder if at run time then what if i write `final String a="hello";` ? – Ankur Anand Apr 26 '15 at 16:45
  • @AnkurAnand: That statement will be in a method in a class. So, `"hello"` goes in the class file's constants pool. When the class is loaded, all of its constants are resolved against the pool (added if not already there, references changed to the pool's version if they are). Your `a` will have a reference to the result. – T.J. Crowder Apr 26 '15 at 17:03
0

I checked .class for

String a = "ABC";
String b = "ABC";

and found only one "ABC" in it. That is javac creates one constant of the same string at compile time.

But if 2 or more classes have the same "ABC" constant then JVM will place them at the same location in string pool

Evgeniy Dorofeev
  • 133,369
  • 30
  • 199
  • 275