Highlight search terms in a string

Question

I have written a function which works with my sites search functionality. When the user searches a word, I perform a replace on the returned search content to take any word that the user entered into the search, and wrap it in span tags with a custom class which will basically bold the word on the page. After overcoming my first road block of having to incorporate case-insensitive replacements, I'm now stuck in another predicament. The word that is replaced on the page is being replaced with the users provided case on the page which looks funny because the content returned is a lot of legal text and acronyms. If a user were to search "rpC 178", the "RPC 178" in the content is displayed as bold and the same case "rpC 178". My first thought was to split the content by "space" and keep a temporary copy of the replaced word before it's replaced in order to preserve it's current case but some of these content blocks can be upwards of 4000 words so that seems inefficient. Am I going about this the wrong way?

Here is my current code:

public static String HighlightWords(String content, String className, String searchTerms)
{
    string[] terms = new string[] { };
    if (!string.IsNullOrWhiteSpace(searchTerms))
    {
        terms = searchTerms.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
    }
    if (terms == null || terms.Length == 0)
    {
        return content;
    }

    var optimizedContent = new StringBuilder(content);
    var startHtml = string.Format("<span class=\"{0}\">", className);
    var endHtml = "</span>";
    string result = string.Empty;
    foreach (var term in terms)
    {
        result = Regex.Replace(optimizedContent.ToString(), term, string.Format("{0}" + term + "{1}", startHtml, endHtml), RegexOptions.Compiled | RegexOptions.IgnoreCase);
    }

    return result;
}

Why do you use a `StringBuilder` at all? Currently it's less efficient than using `content` directly. — Tim Schmelter, Jan 29 '16 at 15:02
Why you initilize result in foreach and then return it? It can't work — erikscandola, Jan 29 '16 at 15:03
@TimSchmelter I was using `string.Replace` earlier and learned on SO that it's more efficient to use `StringBuilder` since strings are immutable — user3267755, Jan 29 '16 at 15:03
@user3267755: but `String.Replace` is a string method and will create a new string. I don't see where the `StringBuilder` comes into play. `StringBuilder.ToString` will also create a new String everytime. That's what makes your redundant `StringBuilder` less efficient. — Tim Schmelter, Jan 29 '16 at 15:04
@TimSchmelter Apparently you can call `Replace` on a `StringBuilder` without assigning to a new variable which is what I was doing previously. The `StringBuilder` isn't critical anymore..I think you're getting hung up on the wrong details presented in the underlying issue — user3267755, Jan 29 '16 at 15:07
_"that seems inefficient. Am I going about this the wrong way"_ - this question seems to be about _efficiency_ of working code which might be better asked on Code Review . Stack Exchange . com — , Jan 29 '16 at 15:20
Oh thanks Zak. Sometimes I would link to a site like that but within seconds I would get told by a foreign moderator that the question didn't belong there either :P — , Jan 29 '16 at 15:23
@Micky When in doubt, don't recommend. Otherwise you're risking the OP posting another question which gets closed. Users don't like their question getting closed twice. And we'll see your comments anyway, we're watching you. — Mast, Jan 29 '16 at 15:25
As a moderator on Code Review, I am not sure if this question is a good fit for CR. It is unclear whether or not the current code works as it is meant to or not. If it is working and it is inefficient, then we can help. If it is not yet implemented even in an inefficient way, then it is off-topic for Code Review. — Simon Forsberg, Jan 29 '16 at 15:27
@Mast See, that's what I mean. How you guys do that? hehe. I think I'll turn my computer off now and look for that spy camera :) — , Jan 29 '16 at 15:27

NineBerry · Accepted Answer · 2016-01-29T15:50:18.067

You can use the other overload of the Regex.Replace method that accepts a MatchEvaluator delegate. Here you pass a method that gets the actual text found as a parameter and can dynamically build the string to use as a replacement.

Sample:

    string output = Regex.Replace(input, term, 
        match => startHtml + match.Value + endHtml, 
        RegexOptions.Compiled | RegexOptions.IgnoreCase);

Note that the notation with the => symbol may not work with older versions of C#. In this case you have to use the longer form:

    string output = Regex.Replace(input, term, new MatchEvaluator(match => 
         {
             return startHtml + match.Value + endHtml;
         }), 
         RegexOptions.Compiled | RegexOptions.IgnoreCase);

So you can also improve your code because you do not need a foreach loop over all the specified search terms. Just build a regular expression that contains all the terms to look for and then use that for searching.

Remember to use Regex.Escape() to escape the data entered by the user before using it for searching with the Regex class, so that everything works as expected when the user enters characters that have a special meaning in regular expressions.

This was exactly what I was looking for, thank you! Works like a charm. — user3267755, Jan 29 '16 at 15:59
You also have to consider that you will get problems when the search terms appear in the HTML syntax of the content. For example, when the user searches for "div" you will create invalid HTML. For a discussion of this, see for example http://stackoverflow.com/questions/23745739/highlight-words-in-html-using-regex-in-c-sharp — NineBerry, Jan 29 '16 at 16:01

Highlight search terms in a string

1 Answers1