1

I have a line of one million letters A. In the loop, I replace the first occurrence of the letter A with "" until the letters end.

Before replacing, I must determine whether there is an occurrence in the string or not, and and I do it this way:

var inx = a.IndexOf(b, StringComparison.Ordinal);
if (inx > -1)
{
     RemoveAndInsertString(ref a, ref b, ref c, inx);
}

I have tried:

private void RemoveAndInsertString(ref string sourceString, ref string rowToRemove, ref string rowToInsert, int inx)
{
    Regex rgx = new Regex(rowToRemove);
    sourceString = rgx.Replace(sourceString, rowToInsert, 1);
}

and

private void RemoveAndInsertString(ref string sourceString, ref string rowToRemove, ref string rowToInsert, int inx)
{
    if (!String.IsNullOrEmpty(rowToInsert))
    {
        StringBuilder sb = new StringBuilder(sourceString.Length + rowToInsert.Length);
        sb.Append(sourceString.Substring(0, inx));
        sb.Append(rowToInsert);
        sb.Append(sourceString.Substring(inx + rowToRemove.Length, sourceString.Length - (inx + rowToRemove.Length)));
        sourceString = sb.ToString();
    }
}

but it's so slow! I made a test, for a string length of 1,000,000: 0.5 seconds for 10,000 replacements and 52 seconds for 100,000 replacements. If the string contains 1,000,000 characters to be replaced - I can't wait for it!

How can I speed up the replacement?

I need make an remark: it's not a loop by ONE string. The situation looks like this: I have a 100000 rows. Rows can have up to 1000000 characters. I should make a replace of substring, if it's exist in the string, or skip this string and make some actions if no substring in the string. And, the lines used in the program do not support Unicode, if you need it. Example with A - it's a bad example, i agree, but it's a test, that testing performance of my program (it's not my test)

Vasiliy Terkin
  • 165
  • 2
  • 15
  • when you are doing replacement one by one, you allocate new string with new content. you shouldn't be surprised to see this slow, because string with 1 million length is copied in memory again with a minor change. and you do it million times. consider doing all of replacement in one go using Regex.Match – M.kazem Akhgary Sep 27 '17 at 11:54
  • Can you post the whole code with a loop? – Antonín Lejsek Sep 27 '17 at 11:55
  • 2
    Possible duplicate of [Memory Efficiency and Performance of String.Replace .NET Framework](https://stackoverflow.com/questions/399798/memory-efficiency-and-performance-of-string-replace-net-framework) – MatSnow Sep 27 '17 at 11:55
  • 2
    Also your use of StringBuilder is completely wrong, StringBuilders improve performance when you do all operations on them without creating any string in process, and finally after all process is finished, create string from StringBuilder. currently you are still allocating new string at every single replacement. – M.kazem Akhgary Sep 27 '17 at 11:56
  • As you are not changing them, you don't need `ref` for your `ref string rowToRemove` and `ref string rowToInsert` parameters. You *do* need it for `sourceString`, although I would use a return value for that. – Hans Kesting Sep 27 '17 at 12:00
  • Is the Regex.Replace really necessary? Also, is the replacement the same length as the original? – Manfred Radlwimmer Sep 27 '17 at 12:00
  • @ManfredRadlwimmer no, replacement can have any length – Vasiliy Terkin Sep 27 '17 at 12:02
  • Regex.Replace is not a very efficient method. What pattern are you using? It's likely it can be replaced with something else that takes less time. – Manfred Radlwimmer Sep 27 '17 at 12:03
  • @ManfredRadlwimmer Regex.Replace is very efficient, its just matter of using it in right place. – M.kazem Akhgary Sep 27 '17 at 12:07
  • @M.kazemAkhgary Which is why I asked for the pattern. If it's something simple a manual approach should be faster. – Manfred Radlwimmer Sep 27 '17 at 12:08
  • 1
    @destroyer25t Could you add a [mcve] of what you are doing? It seems like you have us stabbing in the dark at the moment. As far as I can tell you want to remove every `A` and all subsequent letters. Could you be a bit more specific about what you are trying to do? – Manfred Radlwimmer Sep 27 '17 at 12:17
  • 2
    I agree with Manfred. It's unclear what the requirements are here: `I have a line of one million letters A` - ok, so a string consisting of a million 'A's.. `In the loop, I replace the first occurrence of the letter A with "" until the letters end.` - So according to your requirement, you can just replace the original string with "", which will take almost no time. I'm pretty sure that's not what you mean, so please put a bit more effort into describing what you want in a succinct and accurate manner. – Matthew Watson Sep 27 '17 at 12:18
  • @destroyer25t Unfortunately you latest edit still doesn't completely explain what it is you are replacing with what. What are the conditions? Example input/output? – Manfred Radlwimmer Sep 27 '17 at 12:47
  • Program - is a very simple command interpretator, that working with strings. One of the program commands - command REPLACE, and it replaces a substring in a string. The function takes four arguments, which it takes from the top of the stack. Let's call them a, b, c, ret. Function is trying to replace the most left occurrence of substring b by substring c in string a. If the replacement is successful, all 4 items from the stack are deleted. And the result of the replacement is put on top of the stack. – Vasiliy Terkin Sep 27 '17 at 12:58
  • If the replacement did not work, the function of four lines, taken by the arguments, leaves only a and goes on the label ret. – Vasiliy Terkin Sep 27 '17 at 12:58
  • In the input file of test, for this interpretator, i have string with a length of 1000000 symbols (only A characters). In the test program compiled for the interpreter, from this line, using the COMMAND command, the character A is replaced by the symbol "", until there are no characters left in the line. Then the program will end. – Vasiliy Terkin Sep 27 '17 at 13:04
  • @destroyer25t It that so you can replace recursively or would a simple "replace all instances once" (aka `String.Replace`) also do the job? I don't quite get why you need to replace one at a time - that seems to be your bottleneck. – Manfred Radlwimmer Sep 27 '17 at 13:15
  • I can do it only by iterations, ostensibly real functioning. – Vasiliy Terkin Sep 27 '17 at 13:55
  • The problem is not directly related to your "replace first occurrence" method. By your own metrics, you can perform this method 10,000 times in 0.5 seconds, but increase string length x10, and run time goes up x100 - it isn't linear. It suggests to me that the repeated allocation/deallocation of huge numbers of unique strings is the root of your issue (as stated by @m-kazem-akhgary) - probably compounded because your larger test means larger allocations in the heap means the heap needs to be 'managed' more by the GC. If this is a real requirement, think about some other data structure. – Brett Sep 27 '17 at 14:34

0 Answers0