Why is string a reference type?

Question

Why is string a reference type, even though it's normally primitive data type such as int, float, or double.

See: http://stackoverflow.com/questions/636932/in-c-why-is-string-a-reference-type-that-behaves-like-a-value-type — davidtbernal, Sep 07 '10 at 05:25
Please change the title of your question since it doesn't say anything about it — Carlos Muñoz, Sep 07 '10 at 05:45

score 19 · Answer 1 · answered Sep 07 '10 at 15:33

19

In addition to the reasons posted by Dan:

Value types are, by definition those types which store their values in themselves, rather than referring to a value somewhere else. That's why value types are called "value types" and reference types are called "reference types". So your question is really "why does a string refer to its contents rather than simply containing its contents?"

It's because value types have the nice property that every instance of a given value type is of the same size in memory.

So what? Why is this a nice property? Well, suppose strings were value types that could be of any size and consider the following:

string[] mystrings = new string[3];

What are the initial contents of that array of three strings? There is no "null" for value types, so the only sensible thing to do is to create an array of three empty strings. How would that be laid out in memory? Think about that for a bit. How would you do it?

Now suppose you say

string[] mystrings = new string[3];
mystrings[1] = "hello";

Now we have "", "hello" and "" in the array. Where in memory does the "hello" go? How large is the slot that was allocated for mystrings[1] anyway? The memory for the array and its elements has to go somewhere.

This leaves the CLR with the following choices:

resize the array every time you change one of its elements, copying the entire thing, which could be megabytes in size
disallow creating arrays of value types of unknown size
disallow creating value types of unknown size

The CLR team chose the latter one. Making strings into reference types means that you can create arrays of them efficiently.

answered Sep 07 '10 at 15:33

Eric Lippert

647,829
179
1,238
2,067

The obvious way to go is to allocate the maximum length allowable for a string at declaration time and prepare for an influx of OutOfMemoryExceptions. – Anthony Pegram Sep 07 '10 at 15:38
2

Its a shame I can't favorite answers. – Arcturus Sep 07 '10 at 15:40
@Eric Lippert: I agree with you, but what if `string` were a value-type whose only field was a `char[]` that referenced an interned character array? Since an array is a reference-type, `string`'s size would be a constant `sizeof(IntPtr)` + any padding, wouldn't it? Then there would be no problem having an array of strings. Or do I have it horribly wrong? – Ani Sep 07 '10 at 17:27
2

@Ani: Correct. *That is what a reference type is*. There's no effective difference between a value type whose sole field is a reference and a reference! Obviously they have exactly the same bits, since a value type's bits are just the bits of its members, and if it has reference type as its member, then it just has the bits of the reference. If the bits are exactly the same then why have the struct at all? Just have the reference and be done with it. – Eric Lippert Sep 07 '10 at 17:30
@Eric Lippert: Ok, that makes complete sense. Thanks. – Ani Sep 07 '10 at 17:33
@Eric, @Ani: Eric's right, there'd be *no effective difference* between a value type with one ref type field and a straight-up ref type. But what I'm trying to address in *my* answer is: *if there's really no difference, why pick one or the other?* Of course it's hard to argue with the simple fact that picking a reference type just seems more logical. But there are also *practical drawbacks* to picking a value type in this scenario: for example, it would behave just like a ref type, yes, *but it would be boxed* when cast to `object`. A ref type wouldn't have this weakness. Am I making sense? – Dan Tao Sep 07 '10 at 23:01

Dan Tao · Accepted Answer · 2010-09-07T23:19:54.213

Yikes, this answer got accepted and then I changed it. I should probably include the original answer at the bottom since that's what was accepted by the OP.

New Answer

Update: Here's the thing. string absolutely needs to behave like a reference type. The reasons for this have been touched on by all answers so far: the string type does not have a constant size, it makes no sense to copy the entire contents of a string from one method to another, string[] arrays would otherwise have to resize themelves -- just to name a few.

But you could still define string as a struct that internally points to a char[] array or even a char* pointer and an int for its length, make it immutable, and voila!, you'd have a type that behaves like a reference type but is technically a value type.

This would seem quite silly, honestly. As Eric Lippert has pointed out in a few of the comments to other answers, defining a value type like this is basically the same as defining a reference type. In nearly every sense, it would be indistinguishable from a reference type defined the same way.

So the answer to the question "Why is string a reference type?" is, basically: "To make it a value type would just be silly." But if that's the only reason, then really, the logical conclusion is that string could actually have been defined as a struct as described above and there would be no particularly good argument against that choice.

However, there are reasons that it's better to make string a class than a struct that are more than purely intellectual. Here are a couple I was able to think of:

To prevent boxing

If string were a value type, then every time you passed it to some method expecting an object it would have to be boxed, which would create a new object, which would bloat the heap and cause pointless GC pressure. Since strings are basically everywhere, having them cause boxing all the time would be a big problem.

For intuitive equality comparison

Yes, string could override Equals regardless of whether it's a reference type or value type. But if it were a value type, then ReferenceEquals("a", "a") would return false! This is because both arguments would get boxed, and boxed arguments never have equal references (as far as I know).

So, even though it's true that you could define a value type to act just like a reference type by having it consist of a single reference type field, it would still not be exactly the same. So I maintain this as the more complete reason why string is a reference type: you could make it a value type, but this would only burden it with unnecessary weaknesses.

Original Answer

It's a reference type because only references to it are passed around.

If it were a value type then every time you passed a string from one method to another the entire string would be copied*.

Since it is a reference type, instead of string values like "Hello world!" being passed around -- "Hello world!" is 12 characters, by the way, which means it requires (at least) 24 bytes of storage -- only references to those strings are passed around. Passing around a reference is much cheaper than passing every single character in a string.

Also, it's really not a normal primitive data type. Who told you that?

_{*Actually, this isn't stricly true. If the string internally held a char[] array, then as long as the array type is a reference type, the contents of the string would actually not be passed by value -- only the reference to the array would be. I still think this is basically right answer, though.}

Actually reference types are passed by value as well, but it is the reference itself and not the object that is copied. — Brian Rasmussen, Sep 07 '10 at 05:29
@Brian: OK OK, geez... I knew some stickler was going to come around and correct me on this ;) I will update the answer to be more technically accurate... — Dan Tao, Sep 07 '10 at 05:31
Sorry to say, you are beating around the bush and not explaining why Strings are reference type. You guess is not correct I feel. And check out .net zero by pedzold to learn more about strings. Thank you. — Harsha, Sep 07 '10 at 06:10
You missed the best advantage of a value type String: no null unless you ask for it. Also, isn't a String class exactly the same as what a boxed String struct would be? Aren't we already paying the boxing overhead? — Craig Gidney, Nov 04 '10 at 22:27
@Strilanc: In response to your second point: no, definitely not! Every time you pass a `string` to a method that expects an `object`, the reference to the string is passed without requiring any new memory allocation. On the other hand, every time you pass an `int` to such a method, for example, a new `object` is created to box that `int` (since you weren't passing a *reference* to anything; you were passing a *value*). The same is true for returning references vs. returning values. — Dan Tao, Nov 04 '10 at 23:00
One effect of having `String` be a value type which wraps something like a private `Char[]` which is never exposed to anything that might mutate it would be that `default(string)` could act consistently like an empty string, rather than as a null reference. I would think that could be a worthwhile advantage. Even better would be to define two string types--a struct and a class--which the system's boxing and unboxing methods would recognize, so a `String` struct would be boxed as a `StringObject`. Then `default(StringObject)` could be `null`, while `default(String)` would be an empty string. — supercat, Oct 12 '12 at 21:29

Harsha · Answer 3 · 2010-09-07T06:36:48.010

1

String is a reference type, not a value type. In many cases, you know the length of the string and the content of the string, in such cases, it is easy to allocate the memory for the string. but consider something like this.

string s = Console.ReadLine();

is it not possible to know the allocation details for "s" in compilation time. User enters the values and all the entered string/line is stored in the s. So, strings are stored on heap so that memory is reallocated to fit the content for the string s. And reference to this string is stored on stack.

To learn more please read: .net zero by petzold

Read: Garbage collection from CLR Via C# for allocation details on stack.

Edit: Console.WriteLine(); to Console.ReadLine();

edited Sep 07 '10 at 06:36

answered Sep 07 '10 at 06:02

Harsha

1,861
7
28
56

I'm not sure I understand this explanation. As I pointed out in my answer, whether `string` were a reference type or not, as long as it stored its *contents* in the form of a reference type *internally* (e.g., a `char[]` array), it could behave basically the same way it does currently. This would include dynamic reallocation on the heap in situations like the one you describe. I think the reasons I provided in my answer offer a less obvious but nonetheless more to-the-point explanation of why `string` is a reference type. – Dan Tao Sep 07 '10 at 06:18
2

I assume your example should be Console.ReadLine() – Emile Sep 07 '10 at 06:20
This simply doesn't make sense. If the contents were stored completely within the structure then barring some sort of special handling, strings of different sizes would have to be different types; a 1-char type with room for 1char, a 2-char type with room for 2chars, etc. Alternatively, if the structure contained a reference to an array in the heap, then that would work at runtime whatever the size. This latter approach would be perfectly possible to do. – Jon Hanna Sep 07 '10 at 09:23
1

@Dan: What is the difference between a string that is a value type that contains a reference to a char[] and its length, and a string which is a reference type? Tell you what, I'll wave my magic wand and *poof*, OK, a string is now a value type which is a *handle* to a *heap-allocated data structure* that contains a *length* and an *array of characters*. But that's what a string-as-reference type *is*; I've just described how strings actually *are* implemented. A value that is just a handle to heap memory is what we call a "reference type" in .NET. – Eric Lippert Sep 07 '10 at 15:59
@Eric: I guess I started thinking of the question as: "If you were writing the `string` type from scratch, why would you choose to make it a `class` or a `struct`?" Putting the contents in a `char[]` would seem obvious. And naturally since this is heap-allocated it would seem logical to make it a `class`. But I was just saying, you really *could* make it a `struct` *or* a `class` and it would barely matter. So why pick one or the other, aside from purely philosophical reasons? I felt that my answer (avoid boxing, enable reference equality) gave at least a couple of practical reasons. – Dan Tao Sep 07 '10 at 16:30
@Dan: You would still *effectively* have *reference equality* by default in your hypothetical case where a string was a struct that contained a char[]. The default comparison for structs is to do a bitwise comparison of their members, and since the member is a reference to a char[], *that* member would be compared using reference equality. Thus, two strings values that were bitwise copies of each other would refer to the same memory and effectively be reference equal. The irony is that in that case you would not have *value equality*. :-) – Eric Lippert Sep 07 '10 at 17:07
@Eric: Yes, **but**! In particular I was talking about `ReferenceEquals`, which takes two `object` parameters. I think it would be very counterintuitive for two identical strings -- or heck, even a *single* string "value" -- to cause `ReferenceEquals` to return `false`, which is what would happen if `string` were a value type and got boxed. Don't get me wrong; I realize that the possibility I am referring to is pretty pointless -- basically a reference type "wrapped up" in a value type -- but, realizing such a thing is possible, I felt compelled to come up with reasons why not to do it. – Dan Tao Sep 07 '10 at 17:24
@Eric: If String were a struct whose sole content was an Array, it could use an Array as its primary storage element and yet define and overload methods and properties, and only require one Object to be created for each string. Otherwise, it would have to either be a class which contains an array (requiring two heap allocations per string), inherits from Array (not generally allowed), or has special support as a variable-sized object which isn't an array. The implementers of .net decided to include special-case handling for String. Without that, a Struct would seem a good choice. – supercat Jan 26 '11 at 18:58

Why is string a reference type?

3 Answers3

New Answer

To prevent boxing

For intuitive equality comparison

Original Answer

Linked