3

We're developing a performance-sensitive text serialization class, and we'd like to avoid converting value-types into reference-types wherever possible.

The String.Insert method appears to require you to provide a string parameter, and does not have an overload allowing a single character to be passed in as a value-type.

We're running into this scenario quite frequently, so I want to make sure there isn't another way to accomplish this without converting the character into it's own string, and then passing it to String.Insert

We've considered treating the parent string as a basic array, and inserting a single character from that angle - but this doesn't seem to work either (unless we're doing something wrong).
The major problem with this approach, is that it appears to require us to use the String.AsCharArray method, which produces a copy of the string as a separate reference object - which is what we're trying to avoid in the first place.

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
Giffyguy
  • 20,378
  • 34
  • 97
  • 168
  • 8
    Have you tried StringBuilder ? – Eser Aug 12 '15 at 17:46
  • @Eser Will that accomplish what I'm trying to do performance-wise? I just feel like that might be a little overkill for inserting a single character. – Giffyguy Aug 12 '15 at 17:47
  • 1
    @Giffyguy `StringBuilder` is made for performance. You should read the doc about it. – Pierre-Luc Pineault Aug 12 '15 at 17:48
  • @Pierre-LucPineault Good to know, thanks. I'll look into it. – Giffyguy Aug 12 '15 at 17:49
  • "We're running into this scenario quite frequently, .." but *does it even matter*? That is, what do the performance numbers say about how much - if at all - this effect on performance? I would imagine the Insert time would entirely dominate any 'performance' issue. – user2864740 Aug 12 '15 at 17:49
  • @user2864740 Yes, it does matter. Yes, the performance numbers indicate such. I'm really tired of debating this with people on SO, so I'm not going to. – Giffyguy Aug 12 '15 at 17:53
  • @Giffyguy I'm curious as to these numbers. It's not a debate. Calling String.Insert will likely be *much* more expensive than creating a new string from a single character - if there are some numbers, even a small inkling, then I'll be able to learn something if my aforementioned belief is not true. – user2864740 Aug 12 '15 at 17:54

3 Answers3

4

which produces a copy of the string as a separate reference object - which is what we're trying to avoid in the first place.

There is no way of modifying a string without creating a new one, except with replace if I'm not mistaken. You're trying to resize a string with already-allocated memory. That's why all string methods return a string and don't modify the original.

Philippe Paré
  • 4,279
  • 5
  • 36
  • 56
  • 3
    Strings are immutable, so there is no way to modify them at all. – juharr Aug 12 '15 at 18:01
  • 1
    Yeah, strings are not treated as a reference type in C# – Nyerguds Aug 12 '15 at 18:10
  • So the key is to use a mutable object, such as StringBuilder. – Giffyguy Aug 12 '15 at 18:29
  • 5
    @Nyerguds Strings *are* a reference type in C#, but they're an immutable reference type. If this weren't the case, you wouldn't be able to assign `null` to a string variable. – Kyle Aug 12 '15 at 18:39
  • 1
    @Kyle A valid nitpick, but it's somewhat irrelevant to to the question. They still can't be _modified_ as reference types. – Nyerguds Aug 12 '15 at 19:29
  • 1
    @Nyerguds Entirely relevant. Your wording was *incorrect*. Being a reference type has no bearing on the *(im)mutability* of said type. – user2864740 Aug 13 '15 at 06:49
  • 1
    I never said they weren't reference types. My wording was that they are "not _treated_ as reference types". The main characteristic of reference types is that, if used as function parameter or assigned to a new variable, they are passed on _by reference_ instead of copied. This is not true for immutable types, hence, they are indeed "not treated as reference types." – Nyerguds Aug 13 '15 at 07:35
  • 1
    Actually, you *can* modify a string via unsafe code. And since strings are interned by default, that is likely to cause some...ah...*interesting* bugs. Not recommended. – Nicholas Carey Aug 13 '15 at 18:21
1

It probably doesn't get much simpler than this:

public static string InsertChar( this string s , char c , int i )
{

  // create a buffer of the desired length
  int len = s.Length + 1 ;
  StringBuilder sb = new StringBuilder( len ) ;
  sb.Length = len ;

  int j = 0 ; // pointer to sb
  int k = 0 ; // pointer to s

  // copy the prefix to the buffer
  while ( k < i )
  {
    sb[j++] = s[k++] ;
  }

  // copy the desired char to the buffer
  sb[j++] = c ;

  // copy the suffix to the buffer
  while ( k < s.Length )
  {
    sb[j++] = s[k++] ;
  }

  // stringify it
  return sb.ToString();
}

or maybe this

public static string InsertChar( this string s , char c , int i )
{
  StringBuilder sb = new StringBuilder( s.Length+1 ) ;
  return sb.Append( s , 0 , i ).Append( c ).Append( s , i , s.Length-i ) ;
}

You can probably make it faster by using unsafe code like this (so as to avoid the compares for range checks):

unsafe public static string InsertChar( this string s , char c , int i )
{
  if ( s == null ) throw new ArgumentNullException("s");
  if ( i < 0 || i > s.Length ) throw new ArgumentOutOfRangeException("i");

  char[] buf = new char[s.Length+1];

  fixed ( char *src = s )
  fixed ( char *tgt = buf )
  {
    int j = 0 ; // offset in source
    int k = 0 ; // offset in target

    while ( j < i )
    {
      tgt[k++] = src[j++];
    }

    tgt[k++] = c ;

    while ( j < s.Length )
    {
      tgt[k++] = src[j++] ;
    }

  }

  return new string( buf ) ;
}

And if you know the strings are relatively short, you could speed things up a little more by using stackalloc to allocate the working buffer on the stack instead of on the heap.

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
0

StringBuilder appears to be the standard solution.
It provides a more basic string object, as a standard char array, which you can manipulate repeatedly without allocating memory over and over.
Then, when you are done manipulating the StringBuilder object, you can convert it into a standard string object, allocating memory for the string only once more.

This still allocates memory for the string twice: once for the StringBuilder, and again for the final string object.
But this is the best you can do with the limitations of the platform.

At least memory allocation is no longer dependent on how many iterations you go through in the serialization processes.
That was the main priority, and StringBuilder addresses that problem nicely.

<rant>
Passing strings around by-reference (or by-const-reference) was the only method that made any sense in C++, from a performance and functionality standpoint.
So the fact that .NET made strings into immutable reference-types that are passed around by-value just seems so backwards to me as a C++ developer.
They're already reference types, right?
Why can't we just pass around the reference, like any other object? Geez! :)

My advice to Microsoft:
If your string objects don't support basic string operations, so you have to build a "hack" object StringBuilder, encapsulating a standard char array that works like a real string object, to provide the extra features, that's a pretty clear sign that your managed string objects are terrible, and need to be corrected themselves.
</rant>

Giffyguy
  • 20,378
  • 34
  • 97
  • 168
  • 1
    You can probably guess what I'm going to say :-) If what you're doing in your overall project is 90% elsewhere, maybe you don't care. If you're spending more than 20-30% of time concatenating characters, then maybe it's worth your while to use a special tool. It's easy to tell which camp you are in: [*link*](http://stackoverflow.com/a/378024/23771). – Mike Dunlavey Aug 13 '15 at 21:19