117

How many String objects will be created when using a plus sign in the below code?

String result = "1" + "2" + "3" + "4";

If it was as below, I would have said three String objects: "1", "2", "12".

String result = "1" + "2";

I also know that String objects are cached in the String Intern Pool/Table for performance improvement, but that's not the question.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
The Light
  • 26,341
  • 62
  • 176
  • 258
  • Strings are only interned if you explicitly call String.Intern. – Joe White Feb 03 '12 at 16:58
  • 7
    @JoeWhite: are they? – Igor Korkhov Feb 03 '12 at 17:15
  • 13
    Not quite. All string literals are interned automatically. The results of string operations are not. – Stefan Paul Noack Feb 03 '12 at 18:19
  • What's more, in the OP example, there is only one string constant, and it is interned. I'll update my answer to illustrate. – Chris Shain Feb 03 '12 at 18:37
  • +1. For a real-life example of the need to code a string catenation in that style, the Examples section of http://msdn.microsoft.com/en-us/library/system.runtime.compilerservices.internalsvisibletoattribute.aspx has one which would not be possible if the compiler were unable to optimise it to a single constant, because of the constraints on values assigned to attribute parameters. – ClickRick Jul 13 '14 at 08:44

6 Answers6

163

Surprisingly, it depends.

If you do this in a method:

void Foo() {
    String one = "1";
    String two = "2";
    String result = one + two + "34";
    Console.Out.WriteLine(result);
}

then the compiler seems to emit the code using String.Concat as @Joachim answered (+1 to him btw).

If you define them as constants, e.g.:

const String one = "1";
const String two = "2";
const String result = one + two + "34";

or as literals, as in the original question:

String result = "1" + "2" + "3" + "4";

then the compiler will optimize away those + signs. It's equivalent to:

const String result = "1234";

Furthermore, the compiler will remove extraneous constant expressions, and only emit them if they are used or exposed. For instance, this program:

const String one = "1";
const String two = "1";
const String result = one + two + "34";

public static void main(string[] args) {
    Console.Out.WriteLine(result);
}

Only generates one string- the constant result (equal to "1234"). one and two do not show up in the resulting IL.

Keep in mind that there may be further optimizations at runtime. I'm just going by what IL is produced.

Finally, as regards interning, constants and literals are interned, but the value which is interned is the resulting constant value in the IL, not the literal. This means that you might get even fewer string objects than you expect, since multiple identically-defined constants or literals will actually be the same object! This is illustrated by the following:

public class Program
{
    private const String one = "1";
    private const String two = "2";
    private const String RESULT = one + two + "34";

    static String MakeIt()
    {
        return "1" + "2" + "3" + "4";
    }   

    static void Main(string[] args)
    {
        string result = "1" + "2" + "34";

        // Prints "True"
        Console.Out.WriteLine(Object.ReferenceEquals(result, MakeIt()));

        // Prints "True" also
        Console.Out.WriteLine(Object.ReferenceEquals(result, RESULT));
        Console.ReadKey();
    }
}

In the case where Strings are concatenated in a loop (or otherwise dynamically), you end up with one extra string per concatenation. For instance, the following creates 12 string instances: 2 constants + 10 iterations, each resulting in a new String instance:

public class Program
{
    static void Main(string[] args)
    {
        string result = "";
        for (int i = 0; i < 10; i++)
            result += "a";
        Console.ReadKey();
    }
}

But (also surprisingly), multiple consecutive concatenations are combined by the compiler into a single multi-string concatenation. For example, this program also only produces 12 string instances! This is because "Even if you use several + operators in one statement, the string content is copied only once."

public class Program
{
    static void Main(string[] args)
    {
        string result = "";
        for (int i = 0; i < 10; i++)
            result += "a" + result;
        Console.ReadKey();
    }
}
Chris Shain
  • 50,833
  • 6
  • 93
  • 125
  • what about String result = "1" + "2" + three + four; where two and three are declared like string three = "3"; String four = "4";? – The Light Feb 03 '12 at 17:06
  • Even that results in one string. I just ran it through LinqPad to double check myself. – Chris Shain Feb 03 '12 at 17:14
  • @William - String result = "1" + "2" + three + four would result in a compiler error. If you mean "1" + "2" + 3 + 4 the '3' and '4' would be taken as strings. You might want to go out and read some material on how .NET deals with strings. – Security Hound Feb 03 '12 at 17:22
  • @Ramhound as William said, three and four in his example are variables, which he said are defined to "3" and "4". – Servy Feb 03 '12 at 17:50
  • @ChrisShain Did you forget a `one + two + ` in the first example? Doesn't make a lot of sense... there is no +'ing or concat'ing there...? – Svish Feb 03 '12 at 17:59
  • 1
    @Servy - The comment seems to have been updated. When you change a comment its not marked as being changed. – Security Hound Feb 03 '12 at 19:57
  • 1
    One case that would be nice to consider for completeness is concatenating in a loop. E.g. How many string objects does the following code allocate: `string s = ""; for (int i = 0; i < n; i++) s += "a";` – Joren Feb 03 '12 at 20:08
  • @ChrisShain the compiler's "optimizing away" the `+` signs is called "constant folding", and it applies to other types of constants as well (that is, to numerical constants). Section 7.19 of the C# specification says that constant expressions are those that can be fully evaluated at compile time, which means that `string s2 = s1 + "a" + "b"` is the concatenation of the variable `s1` with the constant expression `"a" + "b"`. – phoog Feb 06 '12 at 11:15
  • 1
    I use LINQPad ( http://www.linqpad.net/ ) or Reflector ( http://www.reflector.net/ ). The former shows you the IL of arbitrary snippets of code, the latter decompiles assemblies into IL and can re-generate equivalent C# from that IL. There is also a built-in tool called ILDASM ( http://msdn.microsoft.com/en-us/library/f7dy01k1(v=vs.80).aspx ) Understanding IL is a tricky thing- see http://codebetter.com/raymondlewallen/2005/02/07/getting-started-understanding-msil-assembly-language/ – Chris Shain Feb 06 '12 at 15:45
86

Chris Shain's answer is very good. As the person who wrote the string concatenation optimizer I would just add two additional interesting points.

The first is that the concatenation optimizer essentially ignores both parentheses and left associativity when it can do so safely. Suppose you have a method M() that returns a string. If you say:

string s = M() + "A" + "B";

then the compiler reasons that the addition operator is left associative, and therefore this is the same as:

string s = ((M() + "A") + "B");

But this:

string s = "C" + "D" + M();

is the same as

string s = (("C" + "D") + M());

so that is the concatenation of the constant string "CD" with M().

In fact, the concatenation optimizer realizes that string concatenation is associative, and generates String.Concat(M(), "AB") for the first example, even though that violates left associativity.

You can even do this:

string s = (M() + "E") + ("F" + M()));

and we'll still generate String.Concat(M(), "EF", M()).

The second interesting point is that null and empty strings are optimized away. So if you do this:

string s = (M() + "") + (null + M());

you'll get String.Concat(M(), M())

An interesting question then is raised: what about this?

string s = M() + null;

We cannot optimize that down to

string s = M();

because M() might return null, but String.Concat(M(), null) would return an empty string if M() returns null. So what we do is instead reduce

string s = M() + null;

to

string s = M() ?? "";

Thereby demonstrating that string concatenation need not actually call String.Concat at all.

For further reading on this subject, see

Why is String.Concat not optimized to StringBuilder.Append?

Community
  • 1
  • 1
Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • I think a couple of errors might have slipped in there. Surely, `("C" + "D") + M())` generates `String.Concat("CD", M())`, not `String.Concat(M(), "AB")`. And further down, `(M() + "E") + (null + M())` should generate `String.Concat(M(), "E", M())`, not `String.Concat(M(), M())`. – hammar Feb 03 '12 at 22:06
  • 21
    +1 for the starting paragraph. :) Answers like this are what always amazes me about Stack Overflow. – brichins Feb 07 '12 at 18:28
23

I found the answer at MSDN. One.

How to: Concatenate Multiple Strings (C# Programming Guide)

Concatenation is the process of appending one string to the end of another string. When you concatenate string literals or string constants by using the + operator, the compiler creates a single string. No run time concatenation occurs. However, string variables can be concatenated only at run time. In this case, you should understand the performance implications of the various approaches.

Community
  • 1
  • 1
David
  • 72,686
  • 18
  • 132
  • 173
22

Just one. The C# compiler will fold string constants and hence it essentially compiles down to

String result = "1234";
JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
13

One, since they're static, the compiler will be able to optimize it to a single string at compile time.

If they had been dynamic, they'd have been optimized to a single call to String.Concat(string, string, string, string).

Joachim Isaksson
  • 176,943
  • 25
  • 281
  • 294
13

I doubt this is mandated by any standard or spec. One version can likely do something different from another.

Miserable Variable
  • 28,432
  • 15
  • 72
  • 133
  • 3
    It is documented behavior at least for Microsoft's C# compiler for VS 2008 and 2010 (see @David-Stratton's answer). That said, you are right- as far as I can tell from a quick perusal, the C# spec doesn't specify this and it probably should be considered an implementation detail. – Chris Shain Feb 03 '12 at 17:53