85

I have a string in which I need to replace markers with values from a dictionary. It has to be as efficient as possible. Doing a loop with a string.replace is just going to consume memory (strings are immutable, remember). Would StringBuilder.Replace() be any better since this is was designed to work with string manipulations?

I was hoping to avoid the expense of RegEx, but if that is going to be a more efficient then so be it.

Note: I don't care about code complexity, only how fast it runs and the memory it consumes.

Average stats: 255-1024 characters in length, 15-30 keys in the dictionary.

Dustin Davis
  • 14,482
  • 13
  • 63
  • 119

9 Answers9

84

Using RedGate Profiler using the following code

class Program
    {
        static string data = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz";
        static Dictionary<string, string> values;

        static void Main(string[] args)
        {
            Console.WriteLine("Data length: " + data.Length);
            values = new Dictionary<string, string>()
            {
                { "ab", "aa" },
                { "jk", "jj" },
                { "lm", "ll" },
                { "yz", "zz" },
                { "ef", "ff" },
                { "st", "uu" },
                { "op", "pp" },
                { "x", "y" }
            };

            StringReplace(data);
            StringBuilderReplace1(data);
            StringBuilderReplace2(new StringBuilder(data, data.Length * 2));

            Console.ReadKey();
        }

        private static void StringReplace(string data)
        {
            foreach(string k in values.Keys)
            {
                data = data.Replace(k, values[k]);
            }
        }

        private static void StringBuilderReplace1(string data)
        {
            StringBuilder sb = new StringBuilder(data, data.Length * 2);
            foreach (string k in values.Keys)
            {
                sb.Replace(k, values[k]);
            }
        }

        private static void StringBuilderReplace2(StringBuilder data)
        {
            foreach (string k in values.Keys)
            {
                data.Replace(k, values[k]);
            }
        }
    }
  • String.Replace = 5.843ms
  • StringBuilder.Replace #1 = 4.059ms
  • Stringbuilder.Replace #2 = 0.461ms

String length = 1456

stringbuilder #1 creates the stringbuilder in the method while #2 does not so the performance difference will end up being the same most likely since you're just moving that work out of the method. If you start with a stringbuilder instead of a string then #2 might be the way to go instead.

As far as memory, using RedGateMemory profiler, there is nothing to worry about until you get into MANY replace operations in which stringbuilder is going to win overall.

Fenton
  • 241,084
  • 71
  • 387
  • 401
Dustin Davis
  • 14,482
  • 13
  • 63
  • 119
  • If the compiler were to optimize this could, would it change the line of StringBuilderReplace1(data); to StringBuilderReplace2(new StringBuilder(data, data.Length * 2));? Just curious. I understand the difference, was just curious if you knew. – pqsk Apr 16 '14 at 15:35
  • 1
    I don't understand why SB method 2 is so much faster - the JIT should optimize both SB #1 and SB #2 so they're the same at runtime. – Dai Jan 14 '17 at 04:31
  • @Dai keep in mind this was in 2011. Things may have changed since then. – Dustin Davis Jan 14 '17 at 18:32
  • 8
    @Dai - (late response) - as stated in the answer, the profiler only measures the elapsed time of the actual function. Since the stringbuilder declaration in Replace#2 is outside the function the construction time is not included in the elapsed time. – Stirrblig May 09 '17 at 13:21
  • 3
    So you're saying 89% of the replacement time for StringBuilderReplace1 is just initializing the StringBuilder instance? And about only 11% (0.461 of 4.059) of the time is spent doing key replacements, as isolated in StringBuilderReplace2? If so... I'd allocate a StringBuilder buffer and limit to some degree of parallelism for batched processing using existing instances. Now, the question is... what does StringBuilder.ToString add to the time? Because, to be fair, your target output is a string after all, and only the first method actually produces a string. – Triynko Oct 18 '19 at 04:00
9

This may be of help: https://learn.microsoft.com/en-us/archive/blogs/debuggingtoolbox/comparing-regex-replace-string-replace-and-stringbuilder-replace-which-has-better-performance

The short answer appears to be that String.Replace is faster, although it may have a larger impact on your memory footprint / garbage collection overhead.

matt
  • 125
  • 12
RMD
  • 3,421
  • 7
  • 39
  • 85
  • interesting. Based on their test, string.replace was better. I was thinking that due to the small size of the strings string.replace would be better considering any overhead of creating a string builder – Dustin Davis Jun 29 '11 at 17:15
  • Working link to the article: https://learn.microsoft.com/en-us/archive/blogs/debuggingtoolbox/comparing-regex-replace-string-replace-and-stringbuilder-replace-which-has-better-performance – Paul N May 19 '21 at 13:38
6

Would stringbuilder.replace be any better [than String.Replace]

Yes, a lot better. And if you can estimate an upper bound for the new string (it looks like you can) then it will probably be fast enough.

When you create it like:

  var sb = new StringBuilder(inputString, pessimisticEstimate);

then the StringBuilder will not have to re-allocate its buffer.

H H
  • 263,252
  • 30
  • 330
  • 514
6

Yes, StringBuilder will give you both gain in speed and memory (basically because it won't create an instance of a string each time you will make a manipulation with it - StringBuilder always operates with the same object). Here is an MSDN link with some details.

Andrei
  • 55,890
  • 9
  • 87
  • 108
  • but is it worth the overhead of creating the string builder? – Dustin Davis Jun 29 '11 at 17:16
  • 1
    @Dustin: with 15-30 replacements it probably is. – H H Jun 29 '11 at 17:30
  • @Henk - there are 15-30 searches, not necessarily that many replacements. The expected average number of markers per string is significant. – Joe Jun 29 '11 at 18:59
  • @Joe: there will be 15-30 scans (except maybe the Regex). – H H Jun 29 '11 at 20:40
  • @Henk, yes, but the performance of the 15-30 scans will be similar for String and StringBuilder. Any difference in performance that justifies the overhead of creating the string builder will have to come from replacements, not scans. – Joe Jun 30 '11 at 06:19
  • @Joe: No, with String.Replace a new buffer has to be made each time. The Builder will have to copy a lot less. – H H Jun 30 '11 at 07:37
2

The problem with @DustinDavis' answer is that it recursively operates on the same string. Unless you're planning on doing a back-and-forth type of manipulation, you really should have separate objects for each manipulation case in this kind of test.

I decided to create my own test because I found some conflicting answers all over the Web, and I wanted to be completely sure. The program I am working on deals with a lot of text (files with tens of thousands of lines in some cases).

So here's a quick method you can copy and paste and see for yourself which is faster. You may have to create your own text file to test, but you can easily copy and paste text from anywhere and make a large enough file for yourself:

using System;
using System.Diagnostics;
using System.IO;
using System.Text;
using System.Windows;

void StringReplace_vs_StringBuilderReplace( string file, string word1, string word2 )
{
    using( FileStream fileStream = new FileStream( file, FileMode.Open, FileAccess.Read ) )
    using( StreamReader streamReader = new StreamReader( fileStream, Encoding.UTF8 ) )
    {
        string text = streamReader.ReadToEnd(),
               @string = text;
        StringBuilder @StringBuilder = new StringBuilder( text );
        int iterations = 10000;

        Stopwatch watch1 = new Stopwatch.StartNew();
        for( int i = 0; i < iterations; i++ )
            if( i % 2 == 0 ) @string = @string.Replace( word1, word2 );
            else @string = @string.Replace( word2, word1 );
        watch1.Stop();
        double stringMilliseconds = watch1.ElapsedMilliseconds;

        Stopwatch watch2 = new Stopwatch.StartNew();
        for( int i = 0; i < iterations; i++ )
            if( i % 2 == 0 ) @StringBuilder = @StringBuilder .Replace( word1, word2 );
            else @StringBuilder = @StringBuilder .Replace( word2, word1 );
        watch2.Stop();
        double StringBuilderMilliseconds = watch1.ElapsedMilliseconds;

        MessageBox.Show( string.Format( "string.Replace: {0}\nStringBuilder.Replace: {1}",
                                        stringMilliseconds, StringBuilderMilliseconds ) );
    }
}

I got that string.Replace() was faster by about 20% every time swapping out 8-10 letter words. Try it for yourself if you want your own empirical evidence.

Meloviz
  • 571
  • 6
  • 16
1

My two cents here, I just wrote couple of lines of code to test how each method performs and, as expected, result is "it depends".

For longer strings Regex seems to be performing better, for shorter strings, String.Replace it is. I can see that usage of StringBuilder.Replace is not very useful, and if wrongly used, it could be lethal in GC perspective (I tried to share one instance of StringBuilder).

Check my StringReplaceTests GitHub repo.

Zdeněk
  • 929
  • 1
  • 8
  • 25
1

Converting data from a String to a StringBuilder and back will take some time. If one is only performing a single replace operation, this time may not be recouped by the efficiency improvements inherent in StringBuilder. On the other hand, if one converts a string to a StringBuilder, then performs many Replace operations on it, and converts it back at the end, the StringBuilder approach is apt to be faster.

supercat
  • 77,689
  • 9
  • 166
  • 211
1

Rather than running 15-30 replace operations on the entire string, it might be more efficient to use something like a trie data structure to hold your dictionary. Then you can loop through your input string once to do all your searching/replacing.

Matt Bridges
  • 48,277
  • 7
  • 47
  • 61
1

It will depend a lot on how many of the markers are present in a given string on average.

Performance of searching for a key is likely to be similar between StringBuilder and String, but StringBuilder will win if you have to replace many markers in a single string.

If you only expect one or two markers per string on average, and your dictionary is small, I would just go for the String.Replace.

If there are many markers, you might want to define a custom syntax to identify markers - e.g. enclosing in braces with a suitable escaping rule for a literal brace. You can then implement a parsing algorithm that iterates through the characters of the string once, recognizing and replacing each marker that it finds. Or use a regex.

Joe
  • 122,218
  • 32
  • 205
  • 338
  • +1 for regex - note that if this is done the actual replacement can use a `MatchEvaluator` to actually do the dictionary lookup. – Random832 Jun 29 '11 at 20:45