39

I was asked in an interview about the number of objects that will be created on the given problem:

String str1 = "First";
String str2 = "Second";
String str3 = "Third";
String str4 = str1 + str2 + str3;

I answered that there would be 6 objects created in the string pool.

3 would be for each of the three variables.
1 would be for str1 + str2 (let's say str).
1 would be for str2 + str3.
1 would be for the str + str3 (str = str1 + str2).

Is the answer I gave correct? If not, what is the correct answer?

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
bhpsh
  • 571
  • 4
  • 7
  • 14
    Why would `str2 + str3` be one of the objects? – Jacob G. Aug 23 '19 at 17:51
  • 5
    I am not entirely sure whether it is defined in the JLS, but when concatenating `String`s, the compiler normally generates a `StringBuilder` to concatenate the `String`s. I am not entirely sure how the `StringBuilder` internally handles the concatenation, but I would say that at least five `Object`s are created: one for `str1` to `str3`, one `StringBuilder` and the final `String` for `String4`. – Turing85 Aug 23 '19 at 17:52
  • 2
    Update: The [JLS](https://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.18.1) actually defines that "*a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.*" In other words: without further information, the question cannot be definitively answered. – Turing85 Aug 23 '19 at 17:55
  • 6
    Nothing against you, but I hate questions like that. What is the deeper meaning of such questions? Especially in an interview? What skills does the interviewer want to find out? – RedCam Aug 23 '19 at 18:03
  • 1
    if the first 3 variables were defined `final` I would say no object is being created (at that code segment) – user85421 Aug 23 '19 at 18:38
  • 4
    And we have... "Interview questions by people who think it's a good idea to second guess an extremely tuned compiler on trivial stuff". "I will take the daily double for 10'000, Alex!" – David Tonhofer Aug 24 '19 at 11:14
  • 2
    @RedCam check out very good answer by Andrew Tobilko (or pretty much any answer here) - even in the most shortened version "It depends on Java compiler and JIT, I can see compliant approaches to get from 1 to 5 objects created here" gives you a lot of insight about what interviewee knows ... Now if that is just automated check-box style question it indeed clearly bad... but as starting point of conversation - why not? – Alexei Levenkov Aug 25 '19 at 01:29

7 Answers7

35

Any answer to your question will depend on the JVM implementation and the Java version currently being used. I think it's an unreasonable question to ask in an interview.

Java 8

On my machine, with Java 1.8.0_201, your snippet results in this bytecode

L0
 LINENUMBER 13 L0
 LDC "First"
 ASTORE 1
L1
 LINENUMBER 14 L1
 LDC "Second"
 ASTORE 2
L2
 LINENUMBER 15 L2
 LDC "Third"
 ASTORE 3
L3
 LINENUMBER 16 L3
 NEW java/lang/StringBuilder
 DUP
 INVOKESPECIAL java/lang/StringBuilder.<init> ()V
 ALOAD 1
 INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
 ALOAD 2
 INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
 ALOAD 3
 INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
 INVOKEVIRTUAL java/lang/StringBuilder.toString ()Ljava/lang/String;
 ASTORE 4

which proves that 5 objects are being created (3 String literals*, 1 StringBuilder, 1 dynamically produced String instance by StringBuilder#toString).

Java 12

On my machine, with Java 12.0.2, the bytecode is

// identical to the bytecode above
L3
 LINENUMBER 16 L3
 ALOAD 1
 ALOAD 2
 ALOAD 3
 INVOKEDYNAMIC makeConcatWithConstants(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; [
  // handle kind 0x6 : INVOKESTATIC
  java/lang/invoke/StringConcatFactory.makeConcatWithConstants(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
  // arguments:
  "\u0001\u0001\u0001"
 ]
 ASTORE 4

which magically changes "the correct answer" to 4 objects since there is no intermediate StringBuilder involved.


*Let's dig a bit deeper.

12.5. Creation of New Class Instances

A new class instance may be implicitly created in the following situations:

  • Loading of a class or interface that contains a string literal (§3.10.5) may create a new String object to represent the literal. (This will not occur if a string denoting the same sequence of Unicode code points has previously been interned.)

In other words, when you start an application, there are already objects in the String pool. You barely know what they are and where they come from (unless you scan all loaded classes for all literals they contain).

The java.lang.String class will be undoubtedly loaded as an essential JVM class, meaning all its literals will be created and placed into the pool.

Let's take a randomly selected snippet from the source code of String, pick a couple of literals from it, put a breakpoint at the very beginning of our programme, and examine if the pool contains these literals.

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence,
               Constable, ConstantDesc {
    ...
    public String repeat(int count) {
        // ... 
        if (Integer.MAX_VALUE / count < len) {
            throw new OutOfMemoryError("Repeating " + len + " bytes String " + count +
                    " times will produce a String exceeding maximum size.");
        }
    }
    ...
}

They are there indeed.

As an interesting find, this IDEA's filtering has a side effect: the substrings I was looking for have been added to the pool as well. The pool size increased by one ("bytes String" was added) after I applied this.contains("bytes String").

Where does this leave us?

We have no idea whether "First" was created and interned before we call String str1 = "First";, so we can't state firmly that the line creates a new instance.

Andrew Tobilko
  • 48,120
  • 14
  • 91
  • 142
  • 3
    each string also contains a `byte[]`, same for `StringBuilder` (and first one will also create a static `byte[0]`) and eventually (haven't checked) the `StringBuilder` can create a new, bigger buffer (if needed) (just proving your 2nd sentence true) – user85421 Aug 23 '19 at 18:22
  • 2
    thinking about it, is `LDC` creating a (new) Object? or just using one? (the literal could have been used before) – user85421 Aug 23 '19 at 18:32
  • 1
    @CarlosHeuberger "Loading of a class or interface that contains a string literal **may** create a new String object to represent the literal. (**This will not occur** if a string denoting the same sequence of Unicode code points **has previously been interned.**)" – Andrew Tobilko Aug 24 '19 at 09:11
  • 1
    not new and just confirming what I wrote: "the literal could have been used before" exactly because that would mean it is already created/interned! But loading a class or interface is not part of code posted in question (despite "on the given problem" is quite open) – user85421 Aug 24 '19 at 14:23
  • 2
    "unreasonable question to ask in an interview" - the funny part that you found question useful and interesting and most of your answer actually does not need anything you will not have at typical interview but good understanding how Java handles strings. I.e. I doubt you need anything to explain that concatenation can be optimized into method call at compile time. Indeed it would be hard to come up with diassembled code and precise versions... but answer like that *may* show how well someone knows the language/framework and things like defined/unspecified behavior... – Alexei Levenkov Aug 25 '19 at 01:20
19

With the given information, the question cannot be definitely answered. As is stated in the JLS, §15.18.1:

... To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.

This means that the answer depends at least on the concrete Java compiler used.

I think the best we can do is give an interval as answer:

  • a smart compiler may be able to infer that str1 to str3 are never used and fold the concatenation during compilation, such that only one String-object is created (the one referenced by str4)
  • The maximum sensible number of Strings created should be 5: one each for str1 to str3, one for tmp = str1 + str2 and one for str4 = tmp + str3.

So... my answer would be "something between one to five String-objects". As to the total number of objects created just for this operation... I do not know. This may also depend how exactly e.g. StringBuffer is implemented.

As an aside: I wonder what the reason behind asking such questions is. Normally, one does not need to care about those details.

Turing85
  • 18,217
  • 7
  • 33
  • 58
  • 2
    I'd think it could just be a conversation starter on how Strings are modelled and to get some insight into how a candidate thinks about java objects, whether the candidate is aware of potential instantiation through the '+' operator etc. It's likely less important to know the exact answer, but to make some sense when arguing about it. Not exactly a question I would ask, but I could see it as reasonable if used in that regard. – Frank Hopkins Aug 24 '19 at 02:15
  • 1
    Either a trivia question, or a really deep question (as answered above). Why is it important? The more temporaries, the more often garbage must be collected, which affects performance. – ChuckCottrill Aug 24 '19 at 02:52
  • 1
    @ChuckCottrill your statement about the GC is true, but I think this situation is a little bit different. We are talking about compile-time constants and compiler transformations. These processes are designed to improve performance and should be opaque to the programmer. I really hope it was meant as a conversational starter... – Turing85 Aug 24 '19 at 06:29
  • 3
    The smart compiler might notice that `str4` isn't used anywhere either, so I think the minimum count should be 0 :-) – Bergi Aug 24 '19 at 16:24
  • 1
    @Bergi well I assume that at least one `String` down the line is used (otherwise the `String`s are unused), so I stay by my 1 from a pragmatic point of view =) – Turing85 Aug 24 '19 at 16:25
9

Java 8 will likely create 5 objects:

  • 3 for the 3 literals
  • 1 StringBuilder
  • 1 for the concatenated String

With Java 9 things changed though and String concatenation does not use StringBuilder anymore.

Puce
  • 37,247
  • 13
  • 80
  • 152
  • 5
    Good answer, but for Java 8, I think you're overlooking the char[] object used by the implementation of StringBuilder. – Andy Thomas Aug 23 '19 at 17:57
  • 2
    https://stackoverflow.com/questions/12806739/is-an-array-a-primitive-type-or-an-object-or-something-else-entirely – 17slim Aug 23 '19 at 18:05
  • 6
    @3limin4t0r "*`char[]` is a primitive value*" - [Nope](https://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html). – Turing85 Aug 23 '19 at 18:05
  • 3
    @AndyThomas if you count the char[] you will end up with even more as also String uses char[] and some constructors also create copies of char[]. – Puce Aug 23 '19 at 18:07
  • 1
    That's a good point actually, which leads me to believe @Turing85's answer is more correct without further clarification from the interviewers. – 17slim Aug 23 '19 at 18:09
  • 1
    @Puce - You're right, I in turn forgot about the char[] objects in the String instances. However, I think you're confusing the *compiled* code for `m(String,String)` in the linked JEP for the *executed* code at runtime, which would include not only the compiled code shown, but also the compiled code that it calls in StringBuilder and String. – Andy Thomas Aug 23 '19 at 19:03
  • 1
    It irks me that `String` doesn't (unless things have changed) include a static `concat` method with overrides that take various numbers of strings, along with one that takes a `String[]`, and maybe a static `from` method that would include overrides for the types that can be used with the `+` operator. Such things would seem way more sensible than `StringBuilder`. – supercat Aug 24 '19 at 02:02
  • 1
    "3 for the 3 variables" – I am pretty sure Java will *never* create objects for variables. (Unless you count a debugger.) There *are* languages in which variables are objects, but Java (like almost all languages) is not one of them. – Jörg W Mittag Aug 24 '19 at 13:57
  • I agree with this answer – Diego Ramos Jan 22 '21 at 20:55
3

It should be 5:

  • three for the three literals (assigned to str1, str2 and str3)

  • one for str1 + str2

  • one for (result from the previous operation) + str3 (assigned to str4)

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • 1
    str1 + str2 will not be concatenated to a String in all Java since quite many years. – Puce Aug 23 '19 at 18:00
3

A conformant Java implementation can concatenate the strings any number of ways, at run time or at compile time, needing any number of run-time objects, including zero objects if it detects that the result is not needed at run time.

Boann
  • 48,794
  • 16
  • 117
  • 146
2

4 string object will be created in string constant pool. 3 for literals and 1 with concatenation.

if we use

String s1 = new String("one")

it will create two object one in constant pool and one in heap memory.

if we define :

String s1 = "one";
String s2 = new String("one");

it will create two object one in constant pool and one in heap memory.

Andrew Tobilko
  • 48,120
  • 14
  • 91
  • 142
Parmar Kamlesh
  • 151
  • 1
  • 15
1

Concatenation operation doesn't create those many String objects. It creates aStringBuilder and then appends the strings. So there may be 5 objects, 3 (variables) + 1 (sb) + 1 (Concatenated string).

  • 2
    It _might_ be 5. – Nexevis Aug 23 '19 at 18:11
  • 1
    As was stated many times, the [JLS](https://docs.oracle.com/javase/specs/jls/se12/html/jls-15.html#jls-15.18.1) says that "*a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.*" Emphasis is on **may**. – Turing85 Aug 23 '19 at 18:11