How many String objects will be created when using a plus sign?

Question

How many String objects will be created when using a plus sign in the below code?

String result = "1" + "2" + "3" + "4";

If it was as below, I would have said three String objects: "1", "2", "12".

String result = "1" + "2";

I also know that String objects are cached in the String Intern Pool/Table for performance improvement, but that's not the question.

Strings are only interned if you explicitly call String.Intern. — Joe White, Feb 03 '12 at 16:58
Not quite. All string literals are interned automatically. The results of string operations are not. — Stefan Paul Noack, Feb 03 '12 at 18:19
What's more, in the OP example, there is only one string constant, and it is interned. I'll update my answer to illustrate. — Chris Shain, Feb 03 '12 at 18:37
+1. For a real-life example of the need to code a string catenation in that style, the Examples section of http://msdn.microsoft.com/en-us/library/system.runtime.compilerservices.internalsvisibletoattribute.aspx has one which would not be possible if the compiler were unable to optimise it to a single constant, because of the constraints on values assigned to attribute parameters. — ClickRick, Jul 13 '14 at 08:44

Chris Shain · Accepted Answer · 2014-07-13T14:32:40.877

Surprisingly, it depends.

If you do this in a method:

void Foo() {
    String one = "1";
    String two = "2";
    String result = one + two + "34";
    Console.Out.WriteLine(result);
}

then the compiler seems to emit the code using String.Concat as @Joachim answered (+1 to him btw).

If you define them as constants, e.g.:

const String one = "1";
const String two = "2";
const String result = one + two + "34";

or as literals, as in the original question:

String result = "1" + "2" + "3" + "4";

then the compiler will optimize away those + signs. It's equivalent to:

const String result = "1234";

Furthermore, the compiler will remove extraneous constant expressions, and only emit them if they are used or exposed. For instance, this program:

const String one = "1";
const String two = "1";
const String result = one + two + "34";

public static void main(string[] args) {
    Console.Out.WriteLine(result);
}

Only generates one string- the constant result (equal to "1234"). one and two do not show up in the resulting IL.

Keep in mind that there may be further optimizations at runtime. I'm just going by what IL is produced.

Finally, as regards interning, constants and literals are interned, but the value which is interned is the resulting constant value in the IL, not the literal. This means that you might get even fewer string objects than you expect, since multiple identically-defined constants or literals will actually be the same object! This is illustrated by the following:

public class Program
{
    private const String one = "1";
    private const String two = "2";
    private const String RESULT = one + two + "34";

    static String MakeIt()
    {
        return "1" + "2" + "3" + "4";
    }   

    static void Main(string[] args)
    {
        string result = "1" + "2" + "34";

        // Prints "True"
        Console.Out.WriteLine(Object.ReferenceEquals(result, MakeIt()));

        // Prints "True" also
        Console.Out.WriteLine(Object.ReferenceEquals(result, RESULT));
        Console.ReadKey();
    }
}

In the case where Strings are concatenated in a loop (or otherwise dynamically), you end up with one extra string per concatenation. For instance, the following creates 12 string instances: 2 constants + 10 iterations, each resulting in a new String instance:

public class Program
{
    static void Main(string[] args)
    {
        string result = "";
        for (int i = 0; i < 10; i++)
            result += "a";
        Console.ReadKey();
    }
}

But (also surprisingly), multiple consecutive concatenations are combined by the compiler into a single multi-string concatenation. For example, this program also only produces 12 string instances! This is because "Even if you use several + operators in one statement, the string content is copied only once."

public class Program
{
    static void Main(string[] args)
    {
        string result = "";
        for (int i = 0; i < 10; i++)
            result += "a" + result;
        Console.ReadKey();
    }
}

what about String result = "1" + "2" + three + four; where two and three are declared like string three = "3"; String four = "4";? — The Light, Feb 03 '12 at 17:06
Even that results in one string. I just ran it through LinqPad to double check myself. — Chris Shain, Feb 03 '12 at 17:14
@William - String result = "1" + "2" + three + four would result in a compiler error. If you mean "1" + "2" + 3 + 4 the '3' and '4' would be taken as strings. You might want to go out and read some material on how .NET deals with strings. — Security Hound, Feb 03 '12 at 17:22
@Ramhound as William said, three and four in his example are variables, which he said are defined to "3" and "4". — Servy, Feb 03 '12 at 17:50
@ChrisShain Did you forget a `one + two + ` in the first example? Doesn't make a lot of sense... there is no +'ing or concat'ing there...? — Svish, Feb 03 '12 at 17:59
@Servy - The comment seems to have been updated. When you change a comment its not marked as being changed. — Security Hound, Feb 03 '12 at 19:57
One case that would be nice to consider for completeness is concatenating in a loop. E.g. How many string objects does the following code allocate: `string s = ""; for (int i = 0; i < n; i++) s += "a";` — Joren, Feb 03 '12 at 20:08
@ChrisShain the compiler's "optimizing away" the `+` signs is called "constant folding", and it applies to other types of constants as well (that is, to numerical constants). Section 7.19 of the C# specification says that constant expressions are those that can be fully evaluated at compile time, which means that `string s2 = s1 + "a" + "b"` is the concatenation of the variable `s1` with the constant expression `"a" + "b"`. — phoog, Feb 06 '12 at 11:15
I use LINQPad ( http://www.linqpad.net/ ) or Reflector ( http://www.reflector.net/ ). The former shows you the IL of arbitrary snippets of code, the latter decompiles assemblies into IL and can re-generate equivalent C# from that IL. There is also a built-in tool called ILDASM ( http://msdn.microsoft.com/en-us/library/f7dy01k1(v=vs.80).aspx ) Understanding IL is a tricky thing- see http://codebetter.com/raymondlewallen/2005/02/07/getting-started-understanding-msil-assembly-language/ — Chris Shain, Feb 06 '12 at 15:45

score 86 · Answer 2 · edited May 23 '17 at 12:16

Chris Shain's answer is very good. As the person who wrote the string concatenation optimizer I would just add two additional interesting points.

The first is that the concatenation optimizer essentially ignores both parentheses and left associativity when it can do so safely. Suppose you have a method M() that returns a string. If you say:

string s = M() + "A" + "B";

then the compiler reasons that the addition operator is left associative, and therefore this is the same as:

string s = ((M() + "A") + "B");

But this:

string s = "C" + "D" + M();

is the same as

string s = (("C" + "D") + M());

so that is the concatenation of the constant string "CD" with M().

In fact, the concatenation optimizer realizes that string concatenation is associative, and generates String.Concat(M(), "AB") for the first example, even though that violates left associativity.

You can even do this:

string s = (M() + "E") + ("F" + M()));

and we'll still generate String.Concat(M(), "EF", M()).

The second interesting point is that null and empty strings are optimized away. So if you do this:

string s = (M() + "") + (null + M());

you'll get String.Concat(M(), M())

An interesting question then is raised: what about this?

string s = M() + null;

We cannot optimize that down to

string s = M();

because M() might return null, but String.Concat(M(), null) would return an empty string if M() returns null. So what we do is instead reduce

string s = M() + null;

to

string s = M() ?? "";

Thereby demonstrating that string concatenation need not actually call String.Concat at all.

For further reading on this subject, see

Why is String.Concat not optimized to StringBuilder.Append?

I think a couple of errors might have slipped in there. Surely, `("C" + "D") + M())` generates `String.Concat("CD", M())`, not `String.Concat(M(), "AB")`. And further down, `(M() + "E") + (null + M())` should generate `String.Concat(M(), "E", M())`, not `String.Concat(M(), M())`. — hammar, Feb 03 '12 at 22:06
+1 for the starting paragraph. :) Answers like this are what always amazes me about Stack Overflow. — brichins, Feb 07 '12 at 18:28

score 23 · Answer 3 · edited Jun 20 '20 at 09:12

I found the answer at MSDN. One.

How to: Concatenate Multiple Strings (C# Programming Guide)

Concatenation is the process of appending one string to the end of another string. When you concatenate string literals or string constants by using the + operator, the compiler creates a single string. No run time concatenation occurs. However, string variables can be concatenated only at run time. In this case, you should understand the performance implications of the various approaches.

score 22 · Answer 4 · answered Feb 03 '12 at 16:58

22

Just one. The C# compiler will fold string constants and hence it essentially compiles down to

String result = "1234";

answered Feb 03 '12 at 16:58

JaredPar

733,204
149
1,241
1,454

I thought whenever you use "", it creates a String object. – The Light Feb 03 '12 at 17:08
1

@William in general yes. But constant folding will remove the unnecessary intermediate steps – JaredPar Feb 03 '12 at 17:10

Joachim Isaksson · Answer 5 · 2012-02-07T19:08:07.687

13

One, since they're static, the compiler will be able to optimize it to a single string at compile time.

If they had been dynamic, they'd have been optimized to a single call to String.Concat(string, string, string, string).

edited Feb 07 '12 at 19:08

answered Feb 03 '12 at 16:57

Joachim Isaksson

176,943
25
281
294

score 13 · Answer 6 · answered Feb 03 '12 at 17:38

13

I doubt this is mandated by any standard or spec. One version can likely do something different from another.

answered Feb 03 '12 at 17:38

Miserable Variable

28,432
15
72
133

3

It is documented behavior at least for Microsoft's C# compiler for VS 2008 and 2010 (see @David-Stratton's answer). That said, you are right- as far as I can tell from a quick perusal, the C# spec doesn't specify this and it probably should be considered an implementation detail. – Chris Shain Feb 03 '12 at 17:53

How many String objects will be created when using a plus sign?

6 Answers6

Linked

Related