4

Note: This is a hypothetical discussion. I don't actually want to implement a struct String.

The .Net String class could be a value type (a struct), because it is immutable and has few members. But String isn't a value type. Probably because String was designed before nullable types were introduced, or possibly to match the behavior of Java strings.

Would it be beneficial to change String to a value type or implement a value-type variant of String? It would remove a level of indirection and match the common non-nullable case.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Craig Gidney
  • 17,763
  • 5
  • 68
  • 136

4 Answers4

7

Short Answer

A string has to have a reference type member (e.g., a char[]) in order to be of variable size. Thus any struct String type would really just be a reference type disguised as a value type anyway.


Medium Answer

I discussed this in more depth here. But the basic gist of my idea was: yes, you could have a string "value type," presumably something like this:

public struct String
{
    char[] m_characters;

    public String(IEnumerable<char> characters)
    {
        m_characters = characters.ToArray();
    }

    public char this[int index]
    {
        get { return m_characters[index]; }
    }

    // All those other string functions... IndexOf, Substring, etc.
}

...but there's really no point. The above is essentially just a reference type (a wrapper around a char[]) nestled inside a shell that looks deceptively like a value type. Moreover, when you design a type this way you are getting the drawbacks of using a value type (e.g., potential for boxing) with none of the benefit (an instance of the above String type has the same memory allocation requirements as the reference type it wraps, so it buys you nothing from a GC standpoint either).

Community
  • 1
  • 1
Dan Tao
  • 125,917
  • 54
  • 300
  • 447
  • Well, there’s at least one immedate advantage: one indirection less when accessing the string’s content. – Konrad Rudolph Nov 04 '10 at 16:33
  • @KonradRudolph: to get to the content you still need to cross the reference to the character array (arrays are always reference types). – Richard Nov 04 '10 at 16:52
  • 1
    @KonradRudolph: Strings as implemented in .NET only need one indirection -- String is one of the types that cannot be implemented as is in .NET because the runtime special cases it. The array of chars within a string is directly implement in the String object (which is variable length). – Richard Nov 04 '10 at 18:00
  • @Konrad: I think Richard's right; in fact if you look at the source code for `System.String` in Reflector you see no `char[]` or `char*` member (I've always found this a bit mysterious, actually). It would also seem this is the reason you can fix a `char*` pointer on a `string` directly... although I'm not exactly sure about that. – Dan Tao Nov 04 '10 at 18:21
  • @Richard That optimization actually explains a lot. Very informative. – Craig Gidney Nov 04 '10 at 21:11
3

No. Value types in .Net must have a size known at compile time. The size of a string is often determined only at runtime and hence cannot be model'd as a value type.

Additionally a type in .Net which is a Value type can only have 1 size. Or more simply there cannot be different instances of the same value type with different sizes. This means that you'd need to represent strings of different lengths as different types. For example "dog" and "zebra" would be different incompatible types

Note

It seems like this question can be interpretted in 2 ways

  1. Make string a value type with no alternate storage
  2. Make string a value type and allow for alternate storage in an array

My answer is for scenario #1. It doesn't seem like scenario #2 holds a lot of value because it just replaces a reference type with a value type that has an embedded reference type.

JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
  • uh, the size of the `String` type is very well determined at compile time! You’re confusing it with the length of the *string*. The `String` struct (or indeed class) only needs to hold an integer for the length and a pointer/reference to the actual character buffer (heap-allocated). – Konrad Rudolph Nov 04 '10 at 16:24
  • 1
    @Konrad I'm interpreting the OP's question as eliminating the need for the separate storage of the string and instead having the storage being completely within the `string` type. To do otherwise is to gain very little as it's just switching a ref type to a value type holding a ref type. – JaredPar Nov 04 '10 at 16:27
  • I interpreted the question differently and I think it’s very meaningful, in light of the fact that strings in .NET *have* value semantics for the most part, since they’re immutable. I think asking “why aren’t they structs, then” is an *obvious* question. – Konrad Rudolph Nov 04 '10 at 16:30
  • I intended the second meaning, because I am aware you need a fixed size type. – Craig Gidney Nov 04 '10 at 21:09
2

This would indeed be a valid implementation.

Very naively, it could look like this:

struct String {
    readonly char[] _buffer;
    // Methods etc. …
}

There is one peculiarity when compared to the string class (apart from the fact that it cannot be null): a zero-sized string is not null-terminated! As far as I remember, .NET strings are null-terminated to facilitate interaction with legacy C APIs (WinAPI).

There is one point where a string class has an advantage: interning can be implemented easier: String.Intern is a sort of builder function that, given the same string value, always returns the same string instance. That way, a comparison of two interned strings a and b can be sped up considerably: it’s now sufficient to test their addresses.

But of course, a similar kind of string interning could be implemented for string structs, by comparing whether their character buffer shares the same address.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • I agree with you, but I also agree with Jared that this buys you very little. Basically what you get is a non-nullable string; but what you lose is a reference type which does not need to be boxed (and I suspect strings are passed around as objects quite a bit) as well as a type that behaves as expected with `ReferenceEquals`. – Dan Tao Nov 04 '10 at 16:38
0

No. Structs of any given type always have the same length. Different instances of a string do not.

Flynn1179
  • 11,925
  • 6
  • 38
  • 74
  • A char array reference has a fixed size. – Craig Gidney Nov 04 '10 at 21:13
  • Then your struct isn't a string, it contains a reference to one, which eliminates any benefits of using a value type. – Flynn1179 Nov 05 '10 at 09:31
  • 1
    The main advantage is that structs are non-nullable, which is great because you usually want non-nullability. The expose-null-as-"" logic can be entirely contained in the String struct. The secondary advantage, which is apparently made unnecessary by special optimizations for String, is removing one level of indirection. You have a pointer instead of a pointer to a pointer. – Craig Gidney Nov 08 '10 at 14:39