19

I'm aware if you make

for (condition) {
    String s = "hi there";
}

Just one String instance is created in all the iterations, unlike String s = new String("hi there"); that will create a new instance in each iteration.

But, reading Effective Java from Joshua Bloch: Chapter 2 Item 5 (page 20) it states:

Furthermore, it is guaranteed that the object will be reused by any other code running in the same virtual machine that happens to contain the same string literal [JLS, 3.10.5].

AFAIK that does not say happens to be the same string literal, it says contains.

Reading [JLS, 3.10.5] cannot find any exact reference to this and I have a doubt.

Giving this snippet:

String s1 = "hi ";
String s2 = "there";
String s3 = "hi there";

How many instances are created?

  • 3 instances (thus, phrase is not really exact).
  • 2 instances, s1 and s2 (then s3 is created reusing s1 and s2 references)
cat
  • 3,888
  • 5
  • 32
  • 61
Jordi Castilla
  • 26,609
  • 8
  • 70
  • 109
  • 1
    He probably means "the virtual machine contains ..", not the string contains another string – Wim Deblauwe Jul 15 '16 at 11:44
  • 1
    I am not sure, so a comment instead of an answer. But I think that the "contain" is partially wrong and your example indeed yields three instances. – glglgl Jul 15 '16 at 11:44
  • @glglgl actually is what *my logic* says, but can be JVM smart enough to create `s3` as a reference to `s1` + `s2`?? – Jordi Castilla Jul 15 '16 at 11:45
  • 3
    It reuses object from String pool. So yes there will be three instances since each of them are (at that point) three different immutable objects. The word I was killing myself to use was "Intern" which is what JVM does. I suppose if you are asking whether JVM "Cleverly substrings" String objects to optimise RT memory usage - not too sure if it does. Also [this](http://stackoverflow.com/questions/10759844/reusability-of-strings-in-java) might help? – ha9u63a7 Jul 15 '16 at 11:45
  • 2
    @glglgl The use of contain is not wrong, the OP is just reading the word contain wrong, and missing the fact it is talking about other code that contain the **same string literal**. A substring is not a string literal, the text is also not talking about string literals containing other string literals. – Mark Rotteveel Jul 15 '16 at 11:50
  • @ha9u63ar AFAIK the string type no longer supports shared substring storage since Java 7. – Random832 Jul 15 '16 at 14:21

3 Answers3

16

The JLS does not guarantee any reuse of sub-strings whatsoever. The "contain" here is just meant that the class mentions the exact same string literal somewhere. It is not used in the "substring of" sense.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
  • 2
    Specifically _"any other code [..] that happens to contain the **same string literal**"_ (emphasis mine) – Mark Rotteveel Jul 15 '16 at 11:47
  • 1
    when you say *not guarantee any reuse of sub-strings* means it can happen sometimes? – Jordi Castilla Jul 15 '16 at 11:53
  • 3
    @JordiCastilla: I don't think any current VM reuses substrings, but it is possible (and previous iterations of OpenJDK for example did sometimes share the underlying char[] when two strings were substrings of each other). Note that you'd *still* observe separate `String` instances and there's no public API to detect if that's happening (i.e. you wouldn't be able to tell without some reflection trickery). – Joachim Sauer Jul 15 '16 at 11:57
  • @JoachimSauer "when two strings were substrings of each other" - as far as I can reason, the only case in which to sets would be mutual subsets of each other would be when they are identical, no?! – Mathias R. Jessen Jul 15 '16 at 21:17
  • @MathiasR.Jessen: sure, should have phrased it differently: "when there are two strings where one is a substring of the other". – Joachim Sauer Jul 18 '16 at 08:34
3

Each class file contains a list of all the string literals or other constants used within that class (except for small numeric constants which are embedded within the instruction stream). If the item 19 in the list is the string literal "Freddy", and local variable Fred has an index of 6, then the bytecode generated for Fred="Freddy"; would likely be ldc 19/astore 6.

When a class is loaded, the system will build a table of all the constants and--for those of reference type--the objects identified thereby. If no instance of a string literal is known to exist, the system will add one to the interning table and store a reference to that. When generating machine code, the ldc 19 will then be replaced with an instruction to load the appropriate reference.

What's important is that by the time any of the code in a class runs, objects have been created for all the string literals therein, so a statement like Fred="Freddy"; will merely store a reference to an already-existing String object containing Freddy, rather than creating a new String object.

supercat
  • 77,689
  • 9
  • 166
  • 211
2

If s3 reused s1 and s2 instances, then s3 would not be physically represented as a continuous character array, but would rather be a composite String of Strings objects.

Now imagine the performance impact to accessing individual characters within such a string - index based access would actually involve comparing the index value with the size of the first string, then calculation of offset which would become index for the second string, etc.

Actually, the opposite could make sense: Only one underlying char sequence could be allocated for "hi there" (s3), and s1 and s2 could just store their lengths and addresses of the first character within that string. But I assume that it would be a complex and expensive work for jvm to identify the 'embeddable' candidates and that the cost would outweigh the potential benefit.

Dragan Bozanovic
  • 23,102
  • 5
  • 43
  • 110
  • 1
    Well, prior to Java 7 the `substring`-method used to be implemented in a way that it returned a String backed by the original String's character array, but even that was dropped because it caused more harm than good (large texts could be kept alive by holding a reference to some tiny substring, for example) – Hulk Jul 18 '16 at 07:44
  • 1
    @Hulk: It has been [change in Java7update6](http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4513622). It’s not only a gc issue; it requires every string to carry an `offset` and `length` field for the sole purpose of a single operation, `substring`. Further, the string deduplication feature of recent JVMs benefits from the simplified object layout as a single `cas` on the `value` field is sufficient. – Holger Sep 20 '16 at 15:00