1

I need to highlight search terms in a block of text.

My initial thought was looping though the search terms. But is there an easier way?

Here is what I'm thinking using a loop...

public string HighlightText(string inputText)
{
    string[] sessionPhrases = (string[])Session["KeywordPhrase"];
    string description = inputText;

    foreach (string field in sessionPhrases)
    {
        Regex expression = new Regex(field, RegexOptions.IgnoreCase);
        description = expression.Replace(description, 
                                         new MatchEvaluator(ReplaceKeywords));
    }
    return description;
}

public string ReplaceKeywords(Match m)
{
    return "<span style='color:red;'>" + m.Value + "</span>";
}
Benjol
  • 63,995
  • 54
  • 186
  • 268
user713813
  • 775
  • 1
  • 8
  • 20
  • What do you mean by easier? That code looks pretty easy to me. Do you mean more efficient, shorter, more useful.... ? – jhsowter Feb 15 '12 at 05:41
  • yes, more efficient. is there a regex replace function that handles array replacement already? – user713813 Feb 15 '12 at 05:50
  • It is very important to use Regex.Escape on your field. Otherwise you could get "regex injection", granted, not as bad as sql injection but not good nonetheless. – jessehouwing Feb 15 '12 at 08:44

2 Answers2

1

You could replace the loop with something like:

string[] phrases = ...
var re = String.Join("|", phrases.Select(s => Regex.Escape(s)).ToArray());
text = Regex.Replace(re, text, new MatchEvaluator(SomeFunction), RegexOptions.IgnoreCase);
Qtax
  • 33,241
  • 9
  • 83
  • 121
0

Extending on Qtax's answer:

phrases = ...

// Use Regex.Escape to prevent ., (, * and other special characters to break the search
string re = String.Join("|", phrases.Select(s => Regex.Escape(s)).ToArray());

// Use \b (expression) \b to ensure you're only matching whole words, not partial words
re = @"\b(?:" +re + @")\b"

// use a simple replacement pattern instead of a MatchEvaluator
string replacement = "<span style='color:red;'>$0</span>";
text = Regex.Replace(re, text, replacement, RegexOptions.IgnoreCase);

Not that if you're already replacing data inside HTML, it might not be a good idea to use Regex to replace just anything in the content, you might end up getting:

<<span style='color:red;'>script</span>> 

if someone is searching for the term script.

To prevent that from happening, you could use the HTML Agility Pack in combination with Regex.

You might also want to check out this post which deals with a very similar issue.

Community
  • 1
  • 1
jessehouwing
  • 106,458
  • 22
  • 256
  • 341
  • 1
    You seem to be assuming the "keywords" always start and end with word characters. I'd get a ruling on that before suggesting the use of `\b`. – Alan Moore Feb 15 '12 at 10:04