20

Inspired by the comments on this question, I'm pretty sure that Java Strings are interned at runtime rather than compile time - surely just the fact that classes can be compiled at different times, but would still point to the same reference at runtime.

I can't seem to find any evidence to back this up. Can anyone justify this?

Community
  • 1
  • 1
Noel M
  • 15,812
  • 8
  • 39
  • 47

2 Answers2

17

The optimization happens (or at least can happen) in both places:

  • If two references to the same string constant appear in the same class, I'd expect the class file to only contain one constant pool entry. This isn't strictly required in order to ensure that there's only one String object created in the JVM, but it's an obvious optimization to make. This isn't actually interning as such - just constant optimization.
  • When classes are loaded, the string pool for the class is added to the intern pool. This is "real" interning.

(I have a vague recollection that one of the bits of work for Java 7 around "small jar files" included a single string pool for the whole jar file... but I could be very wrong.)

EDIT: Section 5.1 of the JVM spec, "The Runtime Constant Pool" goes into details of this:

To derive a string literal, the Java virtual machine examines the sequence of characters given by the CONSTANT_String_info structure.

  • If the method String.intern has previously been called on an instance of class String containing a sequence of Unicode characters identical to that given by the CONSTANT_String_info structure, then the result of string literal derivation is a reference to that same instance of class String.

  • Otherwise, a new instance of class String is created containing the sequence of Unicode characters given by the CONSTANT_String_info structure; that class instance is the result of string literal derivation. Finally, the intern method of the new String instance is invoked.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I remember somewhere I read said that the string pool are scoped by packages. I cannot recall the source of it – Dennis C Aug 10 '10 at 16:29
  • @Dennis: That sounds inaccurate to me. Section 3.10.5 of the JLS doesn't mention that as far as I can see. – Jon Skeet Aug 10 '10 at 16:32
  • (+1) I followed the link in the [comments that lead to this question](http://stackoverflow.com/questions/3450604/why-is-there-no-string-empty-in-java/3450623#3450623) to answer with references to the same documentation, only to find that Jon Skeet was already on the case. :P – Tim Stone Aug 10 '10 at 16:41
  • 1
    Since 1.5 there has been the pack200 distribution format. The dedupes much information contained in jar files. However, when used it recreates the jar file (which is a bit daft, but there you go). – Tom Hawtin - tackline Aug 10 '10 at 17:06
  • "If two references to the same string constant appear in the same class, the class file will only contain one string pool entry". Possibly confusion over concepts here? :) :) Classes have no String pool. They have a Constants Pool - which has nothing to do with interning. I believe compiler may have many repeats of a String value in the constants pool, but it may also optimize down to 1 or 0 (0 if it inlines constant values). – Glen Best May 28 '13 at 07:27
  • @GlenBest: I've edited it to say "Constants Pool" - but I would *expect* the compiler to optimize the constants pool in the class file. Yes, interning will *also* optimize this, but there's no point in having the same constant more than once. – Jon Skeet May 28 '13 at 07:30
  • Fair enough! But that's not quite interning - more constant-handling 101. – Glen Best May 28 '13 at 07:33
  • @GlenBest: Okay, will clarify further. – Jon Skeet May 28 '13 at 07:35
  • @JonSkeet Are Intern Pools different from String Constant Pool, this might be a silly question, but I am really confused about this. – Arijit Dasgupta Oct 29 '15 at 13:48
  • @ArijitDasgupta: As far as I'm aware they're basically the same thing. – Jon Skeet Oct 29 '15 at 13:53
  • The research I did trying to answer [this question](http://stackoverflow.com/questions/42824821/java-please-help-me-to-understand-these-code-result) seems to suggest that strings are interned at evaluation time and not at class loading time. Does this conflict with your answer, or am I misunderstanding something? – Robby Cornelissen Mar 16 '17 at 07:53
  • @RobbyCornelissen: There's a lot of comments in that question - I don't see anything to indicate what you're talking about at the moment. Could you point to a specific piece of evidence? I'm trying to work out how I'd even validate that either way... – Jon Skeet Mar 16 '17 at 09:23
  • Take a look at either [my answer](http://stackoverflow.com/a/42827186/3558960) or [this answer](http://stackoverflow.com/a/42827646/3558960). Both seem to suggest that strings are interned at evaluation/execution time, but I definitely see your point as to the difficulties of validating that. When I did my tests looking at the memory addresses of both literals and dynamically created strings, the addresses kept increasing during execution. If the literals where interned at class loading time, you'd expect all the literals to be in a lower memory range than the dynamically created strings. – Robby Cornelissen Mar 16 '17 at 09:45
  • @RobbyCornelissen: Neither of those answers demonstrate that. If the only way you're coming to that conclusion is via the debugger, you should consider that debugging may well very significantly affect how the VM operates. – Jon Skeet Mar 16 '17 at 10:17
  • I agree that neither answer demonstrates that, and am aware of the impact a debugger might have on execution. I came to this conclusion by checking memory addresses using code from [this answer](http://stackoverflow.com/questions/7060215/how-can-i-get-the-memory-location-of-a-object-in-java/7060500#7060500). I do realize that this approach also has its limitations, hence my careful wording in the sense of *[...] this seems to suggest that [...]*. I will not further argue the point, except for mentioning that interning at class-loading time sounds pretty memory inefficient. – Robby Cornelissen Mar 16 '17 at 10:28
  • @RobbyCornelissen: By interning at class-loading time, you don't need to do any checks when *executing* the code as to whether something has already been added into the pool. So yes, it may intern more strings than required, but it will potentially be more time-efficient. If you have so many string constants in your classes that they form a significant part of your memory usage - and most of them aren't used - then that's likely to be a different problem, I'd say. – Jon Skeet Mar 16 '17 at 11:40
1

Runtime.

JLS and JVM specifications specify javac compilation to class files which contain constant declarations (in the Constant Pool) and constant usage in code (which javac can inline as primitive / object reference values). For compile-time String constants, the compiler generates code to construct String instances and to call String.intern() for them, so that the JVM interns String constants automatically. This is a behavioural requirement from JLS:

http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.28

Compile-time constant expressions of type String are always "interned" so as to share unique instances, using the method String.intern.

But these specs have neither the concept nor the definition of any particular String intern pool structures/references/handles whether compile time or runtime. (Of course, in general, the JVM spec does not mandate any particular internal structure for objects: http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html#jvms-2.7)

The reason that no intern pool structures are mentioned is because they're handled entirely with the String class. The intern pool is a private static/class-level structure of the String class (unspecified by JLS & JVM specs & javadoc).

Objects are added to the intern pool when String.intern() is called at runtime. The intern pool is leveraged privately by the String class - when code create new String instances and calls String.intern(), the String class determines whether to reuse existing internal data. Optimisation can be carried out by the JIT compiler - at runtime.

There's no compile-time contribution here, bar the vanilla inlining of constant values.

Glen Best
  • 22,769
  • 3
  • 58
  • 74
  • "As code attempts to create new `String` instances". Really? Or just `String` literals? – Raedwald May 28 '13 at 07:54
  • Have clarified - String.intern() is the trigger for String class to internally attempt to reuse element from pool. Of course, that is called implicitly for compile-time String constants (literals and fixed expressions). May also be called explicitly by developer. – Glen Best May 28 '13 at 08:39