2

A couple of weeks ago, I was asked a C# question in a job interview. The question was exactly this:

string a = "Hello, ";

for(int i = 0; i < 99999999; i++)
{
    a += "world!";
}

I was asked exactly, "why this is a bad method for concatenated string?". My response was some sort of "readability, append should be chosen" etc.

But apparently, this is not the case according to the guy that was interviewing me. So, according to him, every time we concatenate a string, because of the structure of CLR, a new reference is created in memory. So, in the end of the following code, we would have 99999999 of string variable "a" in memory.

I thought, the objects are created just once in the stack as soon as a value is assigned to them (I'm not talking about heap). The way I knew was the memory allocation is done once in the stack for each primitive data types, their values are modified as needed and disposed when the execution of a scope is finished. Is that wrong? Or, are new references of variable "a" actually created in the stack every single time it is concatenated?

Can someone please explain how it works for stack? Many thanks.

  • related, if not a duplicate: http://stackoverflow.com/q/2365272/578411 – rene May 29 '16 at 18:53
  • I think my question is not really a duplicate though. –  May 29 '16 at 18:57
  • Sure, but I can't imagine I can find a duplicate that is a better fit: How about this one: http://stackoverflow.com/q/10341188/578411 – rene May 29 '16 at 19:02
  • Hmm, this actually seems like an answer to my question indirectly, not sure though. The part in the reply actually convinced me, "Note that the compiler can't do anything if you concatenate in a loop." and "... so this does generate a lot of garbage, and it's why you should use a StringBuilder for such cases." Thanks mate. –  May 29 '16 at 19:07
  • This isn't about "the structure of the CLR". It's about the design of the C# language. The `+=` operator doesn't mutate the value in a variable, it mutates a variable to have a new value, based on that objects implementation of the `+` operator. It's a decision made before the CLR even gets involved. – Servy May 29 '16 at 19:15
  • @Servy I don't know mate, that's what the dude told me :D but thanks for explaining! –  May 29 '16 at 19:16
  • You have a C++ background as it seems. You need to make yourself familiar with how C# manages memory. – usr May 29 '16 at 19:21
  • @usr so C# manages memory, not the CLR? –  May 29 '16 at 19:23
  • The proper answer is: "[String is inmutable](https://msdn.microsoft.com/en-us/library/362314fe.aspx)"! – Maciej Los May 29 '16 at 19:27

3 Answers3

0

.NET distinguishes between ref and value types. string is a ref type. It is allocated on the heap without exception. It's lifetime is controlled by the GC.

So, in the end of the following code, we would have 99999999 of string variable "a" in memory.

99999999 have been allocated. Of course, some of them might be GC'ed already.

their values are modified as needed and disposed when the execution of a scope is finished

String is not a primitive or a value type. Those are allocated "inline" inside of something else such as the stack, an array or inside heap objects. They also can be boxed and become true heap objects. None of that applies here.

The problem with this code is not the allocation but the quadratic runtime complexity. I don't think this loop would ever finish in practice.

usr
  • 168,620
  • 35
  • 240
  • 369
0

First remember these two facts:

  • string is an immutable type (existing instances are never modified)
  • string is a reference type (the "value" of a string expression is a reference to the location where the instance is)

Therefore, a statement like:

a += "world!";

will work similar to a = a + "world!";. It will first follow the reference to the "old" a and concat that old string with the string "world!". This involves copying the contents of both old strings into a new memory location. That is the "+" part. It will then move the reference of a from pointing to the old location into pointing to the new location (the newly concatenated string). That is the "=" assignment part of the statement.

Now it follows that the old string instance is left with no references to it. So at some point, the garbage collector will remove it (and possibly move memory around to avoid "holes").

So I guess your job interviewer was absolutely right there. The loop of your question will create a bunch of (mostly very long!) strings in memory (in the heap since you want to be technical).

A simpler approach could be:

string a = "Hello, "
    + string.Concat(Enumerable.Repeat("world!", 999...));

Here we use string.Concat. That method will know it will need to concatenate a bunch of strings into one long string, and it can use some sort of expandable buffer (such as a StringBuilder or even a pointer type char*) internally to make sure it does not create a myriad of "dead" object instances in mememory.

(Do not use ToArray() or similar, as in string.Concat(Enumerable.Repeat("world!", 999...).ToArray()), of course!)

Jeppe Stig Nielsen
  • 60,409
  • 11
  • 110
  • 181
-1

Reference types (i.e. classes & strings) are always created in the heap. Value types (such as structs) are created in the stack and are lost when a function ends execution.

However stating that after the loop you will have N objects in memory is not entirely true. In each evaluation of the of the

a += "world!";

statement you do create a new string. What happens to the previously created string is more complicated. The garbage collector now owns, since there is no other reference to it in your code and will release it at some point, which you don't exactly know when will happen.

Finally, the ultimate problem with this code is that you believe you are modifying an object, but strings are immutable, meaning you cannot really change their value once created. You can only create new ones and this is what the += operator is doing. This would be far more efficient with a StringBuilder which was made to be mutable.

EDIT

As requested, here's stack / heap related clarification. Value types are not always in the stack. They are in the stack when you declare them inside a function body:

void method()
{
    int a = 1; // goes in the stack
}

But go into the heap when they are part of other objects, like when an integer is a property of a class (since the whole class instance is in the heap).

kagelos
  • 423
  • 8
  • 19
  • 1
    Value types do not necessarily go on the stack. They go wherever the variable that the value is being stored in stores its value, which may or may not be the stack. – Servy May 29 '16 at 19:13
  • Well yeah ... obviously having an `int` property in a class, means that the property is also in the heap, where the class instance is. You lost the point totally. – kagelos May 29 '16 at 19:15
  • 1
    So you're agreeing that your statement is incorrect. Why not then change it? – Servy May 29 '16 at 19:17
  • So wait, I have another question. If I create an object, let's say a class, with a couple of properties, are they going to be stored separately in stack and heap? I am completely confused now. –  May 29 '16 at 19:17
  • @MeinHat The object, including its fields, will be stored on the heap. You could then store a reference to that object in a local variable, which might end up storing that reference on the stack. – Servy May 29 '16 at 19:19