3

A comment buried in some C++ code in the SSCLI claims, referring to the unmanaged internal implementation of String.Chars property:

This method is not actually used. JIT will generate code for indexer method on string class.

So...what magical code is this? I understand the whole point of jitters is that they produce different code in different situations. But at the very least, for a modern x64 Windows 7+ platform, how might the/a jitter accomplish this? Or is that truly secret sauce?

Additional details

A while ago I was looking for the fastest way to iterate through individual characters in a string in C#. It turned out the fastest way without resorting to unsafe code or duplicating the contents (via ToCharArray()) was the built-in string indexer, which is actually a call to the String.Chars property. Right in my original question I asked if anyone had insight into how the indexer actually worked, but despite bumps from both Skeet and Lippert, I didn't get any responses on that. So I decided to dig into it myself:

Stop 1: mscorlib

By examining mscorlib.dll with ildasm, we can see that String::get_Chars(int32 index) is just an internalcall pointer (plus an attribute):

.method public hidebysig specialname instance char 
        get_Chars(int32 index) cil managed internalcall
{
  .custom instance void System.Security.SecuritySafeCriticalAttribute::.ctor() = ( 01 00 00 00 ) 
} // end of method String::get_Chars

As noted in the documentation for the MethodImplOptions enumeration, "An internal call is a call to a method that is implemented within the common language runtime itself." Both a 2004 MSDN Magazine article and an SO post indicate that the mapping of internalcall names to unmanaged implementations can be found in ecall.cpp within the Shared Source CLI.

Stop 2: ecapp.cpp

Searching an online copy of ecall.cpp reveals that get_Chars is implemented by COMString::GetCharAt:

FCIntrinsic("get_Chars", COMString::GetCharAt, CORINFO_INTRINSIC_StringGetChar)

Stop 3: comstring.cpp

comstring.cpp does indeed contain an implementation of GetCharAt, starting at line 1219. Except, it's preceded by this comment:

/*==================================GETCHARAT===================================
**Returns the character at position index.  Thows IndexOutOfRangeException as
**appropriate.
**This method is not actually used. JIT will generate code for indexer method on string class.
**
==============================================================================*/
Community
  • 1
  • 1
Joshua Honig
  • 12,925
  • 8
  • 53
  • 75
  • 1
    FCIntrinsic is the key. That means "jitter may replace method call with inline machine code". And yes, that's smarts built into the jitter directly. Speed is the point of this. The sscli20 jitter doesn't have it. – Hans Passant Jul 16 '12 at 14:09

1 Answers1

1

First of all, see Hans Passant's comment for the critical bit.

In early .NET (CLR 1 and 2), the CLR had considerable special support for String and StringBuilder types. In fact, the two types worked so closely together, that StringBuilder.ToString was not copying the actual characters anywhere, and the string indexer was still fetching the characters from that same memory location, using special jitter support. I assume that jitter support for String.Chars was originally necessary to avoid passing the index integer via stack, but the jitter seems to have improved since then.

.NET 4 comes with a different implementation of StringBuilder (ropes) that no longer is tied to how String is handled. (It has to copy during ToString, but has much faster appends.) After these changes,

  • StringBuilder indexer is drammatically slowed down to O(log n) on large strings. See here. It is never inlined, not even on short strings.
  • String indexer still uses (unpublished) special jitter support. I would expect this one to be basically inlined away into a shift, addition and a memory fetch, or something even faster that the nearest loop would allow.
Community
  • 1
  • 1
Jirka Hanika
  • 13,301
  • 3
  • 46
  • 75