8

Why was String designed as a reference type instead of value type?

From the modeling perspective I would have modeled it as a value type since it represents something without identity. It doesn't have distinguishing attributes. (E.g I can't make any difference between one string "a" and another string "a")

I know that I would have had serious performance problems having long strings stored on the stack. Probably it's impossible, as strings get very long, because stack is limited in size.

If it weren't for the performance why would you design System.String as a reference type? (Assume any possible string is at most 16 bytes long)

Liviu Trifoi
  • 2,980
  • 1
  • 21
  • 28
  • 3
    I imagine that you have already seen the answer for potential duplicate http://stackoverflow.com/questions/636932/in-c-why-is-string-a-reference-type-that-behaves-like-a-value-type . However, I am not sure what kind of answer you are looking for: if you are talking about .NET framework engineering, then the biggest reason (given in the other thread) is that Strings can have any size, and could easily fill the 1MB stack - leading not only to performance problems but outright breaking the framework. Do you need a better reason than "not having a broken framework"? – Jean Hominal Jun 25 '10 at 07:42
  • I'm looking for an answer from the modeling perspective. (E.g. An answer like: I would model it as a reference type even if it weren't for the performance & stack because ... ) – Liviu Trifoi Jun 25 '10 at 08:22
  • 3
    But the only reason value types even exist is because of performance. If somebody would model a pure OOP language (like SmallTalk) then they would *never* make types differ in behavior so dramatically. And a string and an int would both be objects. And be as slow as molasses, like SmallTalk. – Hans Passant Jun 25 '10 at 09:51
  • @Hans you should post that as an answer – Davy8 Jun 26 '10 at 16:19

6 Answers6

5

As you point out having a value type which may become very huge may be prohibitive due to limited stack space and the copy-on-use semantics of value types.

Also, the way strings are implemented in .NET adds a couple of elements to the equation. Strings are not only reference types, they are also immutable (outside the System namespace anyway) and the runtime uses interning to do neat tricks for strings.

All this adds up to a couple of benefits: Duplicate literal strings are only stored once and comparison of such strings becomes extremely effective as you can compare references instead of streams of Unicode characters. Those options would not be possible for value types.

Brian Rasmussen
  • 114,645
  • 34
  • 221
  • 317
2

Structs need to be fixed size. Think of a string[], for example. The only way you could have string as a value-type would be to store just the pointer. Which is essentially what we achieve by using a reference-type.

Of course, it is also hugely beneficial that we don't copy the string every time we assign it.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
1

My understanding is that strings are immutable classes instead of structures only as a performance gain.

Strings tend to be created and then passed to many objects for rendering to a user or handing to other systems. After their creation, strings tend not to change, so copying the entire character array as a unique value in each object has little practical value and creates a lot of temporary objects.

Paul Turner
  • 38,949
  • 15
  • 102
  • 166
1

Simple -- because I don't want to make copies of strings every time I pass one into a method. It takes more memory, and it takes more time.

Warren Rumak
  • 3,824
  • 22
  • 30
0

  • ~ Edited to answer question more accurately

One point is that the String type like in many languages is encoded as Unicode and so it's illogical to treat them as primitive types (like int) as there is no direct correspondence between its binary encoding and its human read form.

The Unicode layer automatically qualifies string types to be abstracted away from binary, whereas numbers are interchangeable between base 2 (binary) and base 10 (decimal) forms with relative ease.

The reason that primitive variables can reside on the stack is that there is plenty of room available for a lot of numbers. This isn't the case for the more data heavy String type.

The types of operations carried out on strings are not really arithmetical but more Boolean logic based (except when counting strings when they are treated like a vector or array), so it makes sense to optimise the data structure for it's primary uses, via the System.String namespace.

Alex
  • 4,844
  • 7
  • 44
  • 58
0

In terms of equlity, you still have the possibility to consider it as value-type with == operator.

So if anything, it's just and advantage to have it as a reference no?

SRKX
  • 1,806
  • 1
  • 21
  • 42