32

I am browsing the source code of the Open Source SignalR project, and I see this diff code which is entitled "Don't use StringBuilder or foreach in this hot code path" :

-           public static string MakeCursor(IEnumerable<Cursor> cursors)
+           public static string MakeCursor(IList<Cursor> cursors)
            { 
-               var sb = new StringBuilder();
-               bool first = true;
-               foreach (var c in cursors)
+               var result = "";
+               for (int i = 0; i < cursors.Count; i++)
                {
-                   if (!first)
+                   if (i > 0)
                    {
-                       sb.Append('|');
+                       result += '|';
                    }
-                   sb.Append(Escape(c.Key));
-                   sb.Append(',');
-                   sb.Append(c.Id);
-                   first = false;
+                   result += Escape(cursors[i].Key);
+                   result += ',';
+                   result += cursors[i].Id;
                }
-               return sb.ToString();
+               return result;
            }

I understand why foreach could be less efficient sometimes, and why it is replaced by for.

However, I learned and experienced that the StringBuilder is the most efficient way to concatenate strings. So I am wondering why the author decided to replace it with standard concatenation.

What's wrong in here and in general about using StringBuilder ?

Larry
  • 17,605
  • 9
  • 77
  • 106
  • 8
    What makes you think that `foreach` is less efficient than `for`? It depends on the implementation. And yes, using repeated string concatenation looks like a terrible idea here - particularly if there are lots of cursors. – Jon Skeet Sep 11 '12 at 14:44
  • 1
    Strongly agree with Jon. I'd be surprised if the StringBuilder version would perform worse than the concatenation version. Depending on the count of the loop iterations it might help preventing some "hidden" implicit object initializations if the StringBuilder is initialized with a bigger starting capacity. Have you measured/ profiled both versions to compare them yet? – Jens H Sep 11 '12 at 14:52
  • 4
    You should ask it to [@dfowler](http://stackoverflow.com/users/45091/dfowler) :) – Paolo Moretti Sep 11 '12 at 14:56
  • Yes, I would be interresting @dfowler to join the discussion :) – Larry Sep 11 '12 at 14:59
  • 1
    You really seem to be asking why this change was made--which seems like a question directly to the author of the change. (and/or why the comment doesn't detail *why* the change was made as it's not obvious). – Peter Ritchie Sep 11 '12 at 17:54
  • 3
    I made the code change and yah it made a huge difference in number of allocations (GetEnumerator()) calls vs not. Imagine this code is millions of times per second. The number of enumerators allocated is ridiculous and can be avoided. Check out the new version: https://github.com/SignalR/SignalR/blob/master/SignalR/MessageBus/MessageBus.cs#L546 – davidfowl Sep 11 '12 at 21:23
  • 2
    Follow up https://gist.github.com/3703926 – davidfowl Sep 12 '12 at 03:10

5 Answers5

30

I made the code change and yes it made a huge difference in number of allocations (GetEnumerator()) calls vs not. Imagine this code is millions of times per second. The number of enumerators allocated is ridiculous and can be avoided.

edit: We now invert control in order to avoid any allocations (writing to the writer directly): https://github.com/SignalR/SignalR/blob/2.0.2/src/Microsoft.AspNet.SignalR.Core/Messaging/Cursor.cs#L36

davidfowl
  • 37,120
  • 7
  • 93
  • 103
  • David, what about not doing foreach but still using StringBuilder? – Anthony Mills Sep 11 '12 at 21:52
  • Thank you very much for taking the time to build a prototype to show us proof of the merits of these changes. If I understand correctly, a StringBuilder is more suited to handling long strings, rather than treating the short one behind the other in a long loop. – Larry Sep 12 '12 at 05:45
  • You should not link to master but a rather a specific commit in the project history -_-" – Cetin Sert Sep 21 '12 at 03:52
  • 1
    If performance is of interest, I would think that the right approach would be to read `.Count` once, and if it's equal to one return `cursors[0].Key + ',' + cursors[0].id`; if two, return `cursors[0].Key + ',' + cursors[0].id + '|' + cursors[1].Key + ',' + cursors[1].id` (note: there's no loop there, so only one string allocation). If more than two, I'd probably use a `StringBuilder`, probably putting `|` after each item and decrementing the length at the end before a final `ToString()`; optimal performance would probably be to allocate an array of size `Count*4`, and... – supercat Apr 06 '14 at 21:31
  • ...pupulate it with "", cursors[0].Key, ",", cursors[0].id, "|", cursors[1].key, ",", cursors[1].id, etc. (use a loop to populate everything, including a "|" for the first item, replace the first item with an empty string, and then do a single join. One allocation for the array, and zero temporary strings. – supercat Apr 06 '14 at 21:34
  • Sounds like an alternative implementation that should be measured. Add a comment to the gist and profile it! – davidfowl Apr 07 '14 at 07:19
3

Just hope the guy who changed this actually measured the difference.

  • There is overhead in instantiating a new stringbuilder every time. This also puts pressure on memory/garbage collection.
  • The compiler can generate 'stringbuilderlike' code for simple concatenations
  • The FOR might actually be slower because it might require bounds checking which is not done with foreach loops as the compilers 'knows' they are within bounds.
IvoTops
  • 3,463
  • 17
  • 18
  • 2
    Knowing the project and following its progress on JabbR with David Fowler I'm almost positive this is exactly what prompted the change. He was spending a lot of time measuring with CLRProfiler and the like and addressing all the garbage that was being created. Hopefully he'll weigh in on the thread with specifics. – Drew Marsh Sep 11 '12 at 21:19
2

It depends on the number of Cursors provided to the function.

Most comparisons between the two apporaches seems to favour StringBuilder over string concatination when concatinating 4-10 strings. I would most likely favour StringBuilder if I didn't have explicit reasons not too (e.g. performance comparison of the two approaches for my problem/application). I would consider to pre-allocate a buffer in the StringBuilder to avoid (many) reallocations.

See String concatenation vs String Builder. Performance and Concatenating with StringBuilders vs. Strings for some discussion about the subject.

Community
  • 1
  • 1
larsmoa
  • 12,604
  • 8
  • 62
  • 85
  • 1
    Thanks for the informations. This related SO question is also very interresting : http://stackoverflow.com/questions/550702/at-what-point-does-using-a-stringbuilder-become-insignificant-or-an-overhead – Larry Sep 11 '12 at 19:48
1

How many concatenations are you doing? If numerous, use StringBuilder. If just a few, then the overhead of creating the StringBuilder will outweigh any advantage.

Dave Doknjas
  • 6,394
  • 1
  • 15
  • 28
  • Note that in this case, we only need to get to two cursors to end up with 7 string concatenations, which is pretty nasty. – Jon Skeet Sep 11 '12 at 14:44
  • No necessarily - with an unknown size, you may get multiple reallocations of the internal buffer. That _can_ be expensive. – Oded Sep 11 '12 at 14:45
  • Wouldn't then the correct approach be to look at the amount of cursors and do one path (String) for very few and the other for many (StringBuilder)? – Matt Kerr Sep 11 '12 at 21:29
1

I will put my money on

           StringBuilder sb = new StringBuilder();
           bool first = true;
           foreach (Cursor c in cursors)
           {
                if (first)
                {
                   first = false;  // only assign it once
                }
                else
                {
                    sb.Append('|');
                }
                sb.Append(Escape(c.Key) + ',' + c.Id);
            }
            return sb.ToString();

But I would put my money and the update from dfowler. Check out the link in his answer.

paparazzo
  • 44,497
  • 23
  • 105
  • 176