21

After reading What's the rationale for null terminated strings? and some similar questions I have found that in C#/.NET strings are, internally, both length-prefixed and null terminated like in BSTR Data Type.

What is the reason strings are both length-prefixed and null terminated instead of eg. only length-prefixed?

Guru Stron
  • 102,774
  • 10
  • 95
  • 132
prostynick
  • 6,129
  • 4
  • 37
  • 61
  • 2
    Probably only @Eric Lippert is going to be able to answer this one. There's good reasons for doing one or the other (and trade-offs as well). I'm as surprised as you that C# does **both**. – Yuck Jun 09 '11 at 13:32

5 Answers5

21

Length prefixed so that computing length is O(1).

Null terminated to make marshaling to unmanaged blazing fast (unmanaged likely expects null-terminated strings).

jason
  • 236,483
  • 35
  • 423
  • 525
12

Here is an excerpt from Jon Skeet's Blog Post about strings:

Although strings aren't null-terminated as far as the API is concerned, the character array is null-terminated, as this means it can be passed directly to unmanaged functions without any copying being involved, assuming the inter-op specifies that the string should be marshalled as Unicode.

bluish
  • 26,356
  • 27
  • 122
  • 180
Xaisoft
  • 45,655
  • 87
  • 279
  • 432
4

Most likely, to ensure easy interoperability with COM.

Daniel Hilgarth
  • 171,043
  • 40
  • 335
  • 443
3

While the length field makes it easy for the framework to determine the length of a string (and it lets string contain characters with a zero value), there's an awful lot of stuff that the framework (or user programs) need to deal with that expect NULL terminated strings.

Like the Win32 API, for example.

So it's convenient to keep a NULL terminator on at the end of the string data because it's likely going to need to be there quite often anyway.

Note that C++'s std::string class is implemented the same way (in MSVC anyway). For the same reason, I'm sure (c_str() is often used to pass a std::string to something that wants a C-style string).

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
1

Best guess is that finding the length is constant (O(1)) compared to traversing it, running in O(n).

leppie
  • 115,091
  • 17
  • 196
  • 297
  • That's the reasoning behind prefixing the string with the length. That's not a reason for additionally using a termination character – Daniel Hilgarth Jun 09 '11 at 13:35
  • 1
    @Daniel Hilgarth: And why I did not duplicate the other answers. The question asks the reasoning from both sides. – leppie Jun 09 '11 at 13:36
  • 1
    Sorry, I don't understand your comment - come again? The questions asks what is the reasoning to use **both together**. And not what the reasoning is for one or the other on its own – Daniel Hilgarth Jun 09 '11 at 13:37
  • You're right, but I think the question asks why **both** are used concurrently. Really only one or other other is required to determine string length. – Yuck Jun 09 '11 at 13:38
  • Yes, I am wondering why both are used together and concurrently, and not only one of them (specifically - length-prefixed). – prostynick Jun 09 '11 at 13:44