2

I need to write some code that performs an HTML highlight on specific keywords in a string.

If I have comma separated list of strings and I would like to do a search and replace on another string for each entry in the list. What is the most efficient way of doing it?

I'm currently doing it with a split, then a foreach and a Regex.Match. For example:

string wordsToCheck = "this", "the", "and";
String listArray[] = wordsToCheck.Split(',');
string contentToReplace = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.";

foreach (string word in listArray)
{
    if (Regex.Match(contentToReplace, word + "\\s+", RegexOptions.IgnoreCase).Success)
    {
        return Regex.Replace(contentToReplace , word + "\\s+", String.Format("<span style=\"background-color:yellow;\">{0}</span> ", word), RegexOptions.IgnoreCase);
    }
}

I'm not sure this is the most efficient way because the list of words to check for could get long and the code above could be part of a loop to search and replace a bunch of content.

Kris B
  • 3,436
  • 9
  • 64
  • 106
  • see http://stackoverflow.com/questions/711753/a-better-way-to-replace-many-strings-obfuscation-in-c – Arsen Mkrtchyan Jul 26 '09 at 18:43
  • I ended up using this code: Regex.Replace(contentToReplace, wordsToCheck + "\\s+", "$1 ", RegexOptions.Singleline | RegexOptions.IgnoreCase); – Kris B Jul 26 '09 at 19:49

3 Answers3

1

Don't do that if the wordsToCheck can be modified by a user!

Your approach works perfectly without Regexes. Just do a normal String.Replace.

If the input is safe, you can also use one regex for all keywords, e.g.

return Regex.Replace(contentToReplace, "(this|the|and)", String.Format("<span style=\"background-color:yellow;\">{0}</span> ", word), RegexOptions.IgnoreCase);

where "this|the|and" is simply wordsToCheck where the commas are replaces with pipes "|".

BTW, you might want to take the list keywords directly as a regex instead of a comma separated list. This will give you more flexibility.

vog
  • 23,517
  • 11
  • 59
  • 75
0

You could search for "(this|the|end)" and call Regex.Replace once with a match evaluator, a method, that takes the match and returns a replacement string.

You can build the match pattern by taking your string array and calling Regex.Escape on every element, then join it with String.Join using | as a separator.

sisve
  • 19,501
  • 3
  • 53
  • 95
0

As for your considerations on performance issues - other users told about using 1 regex, and they are right, for even better perf (theoretically) you could use compiled flag, especially that you won't rather change your regex, for more information you may read this.

Marcin Deptuła
  • 11,789
  • 2
  • 33
  • 41