1

There is an old article talking about some string internals in .NET/C#. One of the interesting tidbits:

m_stringLength
This is the logical length of the string, the one returned by String.Length. Because a number of high bits are used for additional flags to enhance performance, the maximum length of the string is constrained to a limit much smaller than UInt32.Max for 32bit systems. Some of these flags indicate the string contains simple characters such as plain ASCII and will not required invoking complex UNICODE algorithms for sorting and comparison tests.

I know that BinaryReader does read strings as length-prefixed with 7bit-encoded integer, does that mean the extra space is used for the aforementioned string flag (0 - ASCII, 1 - wide)?

Is this relevant for mono starting from version 2.0 and above? I'm writing a simple custom wrapper around a string to make it mutable and although that string is not gonna be used in sorting or comparisons (for now) - I was wondering if I should allocate new string pre-emptively filled with ASCII or UNICODE (i.e. if I know/assume the content) char so the flag will be set by default.

Izukai
  • 11
  • 2
  • That article is *very* old; it talks about .NET 1 specifically. I'm very doubtful this information is accurate for .NET 2 onwards. At the very least it's easy to verify [it doesn't apply to current versions of the runtime](https://github.com/dotnet/runtime/blob/master/src/coreclr/src/vm/object.h), `m_stringLength` is just the length and nothing more. Even if these flags existed it doesn't make any sense to fill a string "pre-emptively" since they're immutable. And `BinaryReader` does *not* read strings specially, that applies only to `Read7BitEncodedInt`. – Jeroen Mostert Sep 23 '20 at 08:39
  • Fair point. Not sure what the linked code was intended to showcase, though. – Izukai Sep 23 '20 at 11:37
  • I linked it because `m_stringLength` is not treated separately there, and in fact it's assigned and read with no regard for possible high bits that should be masked off before any length operations. However, my comment was also wrong in that there *are* parts of the runtime where string representation is twiddled with ([`sstring.h`](https://github.com/dotnet/runtime/blob/master/src/coreclr/src/inc/sstring.h)). But here `m_flags` is a separate field, not stuffed in the length, and an `SString` does not 1-1 correspond to a managed string. This might be the evolution of what was one thing in v1. – Jeroen Mostert Sep 23 '20 at 12:15
  • Hmm, I see. Thanks, that makes my life easier. Looks like they no longer to fast ascii compare and even in code it is hardcoded to false – Izukai Sep 23 '20 at 12:25

0 Answers0