2

How can I replace contiguous substring of a string in C#? For example, the string

"<p>The&nbsp;&nbsp;&nbsp;quick&nbsp;&nbsp;&nbsp;fox</p>"

will be converted to

"<p>The&nbsp;quick&nbsp;fox</p>"

rajeemcariazo
  • 2,476
  • 5
  • 36
  • 62

3 Answers3

3

Use the below regex

@"(.+)\1+"

(.+) captures the group of characters and matches also the following \1+ one or more same set of characters.

And then replace the match with $1

DEMO

string result = Regex.Replace(str, @"(.+)\1+", "$1");
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
2

Maybe this simple one is enough:

(&nbsp;){2,}

and replace with $1 (&nbsp; that's captured in first parenthesized group)

See test at regex101


To check, if a substring is followed by itself, also can use a lookahead:

(?:(&nbsp;)(?=\1))+

and replace with empty. See test at regex101.com

Jonny 5
  • 12,171
  • 2
  • 25
  • 42
2

Let's call the original string s and the substring subString:

    var s = "<p>The&nbsp;&nbsp;&nbsp;quick&nbsp;&nbsp;&nbsp;fox</p>";
    var subString = "&nbsp;";

I'd prefer this instead of a regex, much more readable:

    var subStringTwice = subString + subString;

    while (s.Contains(subStringTwice))
    {
        s = s.Replace(subStringTwice, subString);
    }

Another possible solution with better performance:

    var elements = s.Split(new []{subString}, StringSplitOptions.RemoveEmptyEntries);
    s = string.Join(subString, elements);
    // This part is only needed when subString can appear at the start or the end of s
    if (result != "")
    {
        if (s.StartsWith(subString)) result = subString + result;
        if (s.EndsWith(subString)) result = result + subString;                
    }
schnaader
  • 49,103
  • 10
  • 104
  • 136
  • 2
    To my regex loving eyes, this is only more readable if you are **not familiar with regex**. Algorithmically speaking, this could also be much more expensive but that is barely worth mentioning. – Gusdor Feb 05 '15 at 13:26
  • Yup... repeated string replace is expensive, and the split method fails if the substring appears at the start or end of the input. – Rawling Feb 05 '15 at 15:40