5

I have a string like this:

string s = "<p>Hello world, hello world</p>";
string[] terms = new string[] {"hello", "world"};

I want to do a replacement on this string such that each word (case-insensitive) will be matched, and replaced with a numbered index span tag like so:

<p>
    <span id="m_1">Hello</span> 
    <span id="m_2">world</span>, 
    <span id="m_3">hello</span> 
    <span id="m_4">world</span>!
</p>

I tried doing it like this.

int match = 1;
Regex.Replace(s,
    String.Join("|", String.Join("|", terms.OrderByDescending(s => s.Length)
        .Select(Regex.Escape))),
    String.Format("<span id=\"m_{0}\">$&</span>", match++),
    RegexOptions.IgnoreCase);

The output is something like this:

<p>
    <span id="m_1">Hello</span> 
    <span id="m_1">world</span>, 
    <span id="m_1">hello</span> 
    <span id="m_1">world</span>!
</p>

Where all the ids are the same (m_1) because the regex doesn't evaluate match++ for each match, but one for the whole Regex. How do I get around this?

user3685285
  • 6,066
  • 13
  • 54
  • 95
  • May be easier to parse the html and iterate the span nodes, take a look: http://stackoverflow.com/questions/6063203/parsing-html-with-c-net – ferflores Apr 17 '17 at 16:53
  • Does it have to be Regex? Looks like a loop with compare would be a simpler and more readable approach. – Thane Plummer Apr 17 '17 at 16:54
  • @ferflores I am parsing it, but the input has no span nodes. That is the desired output and the actual output. The input is that string up there. – user3685285 Apr 17 '17 at 16:57

1 Answers1

5

All you need to do is to convert the replacement argument from a string pattern to a match evaluator (m => String.Format("<span id=\"m_{0}\">{1}</span>", match++, m.Value)):

string s1 = "<p>Hello world, hello world</p>";
string[] terms = new string[] {"hello", "world"};
var match = 1;
s1 = Regex.Replace(s1,
        String.Join("|", String.Join("|", terms.OrderByDescending(s => s.Length)
            .Select(Regex.Escape))),
    m => String.Format("<span id=\"m_{0}\">{1}</span>", match++, m.Value),
    RegexOptions.IgnoreCase);
Console.Write(s1);
// => <p><span id="m_1">Hello</span> <span id="m_2">world</span>, <span id="m_3">hello</span> <span id="m_4">world</span></p>

See the C# demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563