1

We have HTML String we need to search first occurrence and highlight that word in the HTML Text

Let us say we need to search "American Government" which can be in either of any format

Eg.

American Government

<span>American</Span> <Span>Government</span>

<span>American Government</span>

<span>American </span> Government

We need REGEX which will search HTML to find out such combinations which will give us above all the Match in REGEX.

Without removing TAG still we able to search key word and add some additional Tag with this matches.

We need REGEX for how to get such kind of combination words from HTML.

Nightfirecat
  • 11,432
  • 6
  • 35
  • 51
Jigar
  • 93
  • 8
  • (I'm editing and adding C#, but it could be VB.NET . I'm not sure... If you prefer re-edit it) – xanatos Sep 05 '11 at 19:40
  • @xanatos: Why do you assume that it is .NET specific at all? – Tim Schmelter Sep 05 '11 at 19:44
  • I suggest you read [this](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). If the format of the HTML is variable (can be any valid HTML), RegEx is not a good solution. – Oded Sep 05 '11 at 19:44
  • @Tim Because the title is "ASP.NET RegEX". Now... Technically it could be Javascript, but (I hope) he wouldn't have specified ASP.NET Regex. Now. If Jigar tells me I was wrong (perhaps he is using an XSLT), I'll be the first to apologize. – xanatos Sep 05 '11 at 19:45
  • @xanatos: Ah ok, i've read the whole question (before the edits) but left out the title. Therefore the `.NET`-Tag exists. – Tim Schmelter Sep 05 '11 at 20:00
  • What is the definition of "text"? What if there were linebreaks (`
    `)?
    – NullUserException Sep 05 '11 at 20:03
  • What if it was `American Government` – xanatos Sep 05 '11 at 20:05

4 Answers4

4

We need REGEX which will search HTML

Oh no, believe me you don't need that. You need an HTML parser such as Html Agility Pack.

Community
  • 1
  • 1
Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
  • 1
    +1 for your answer, but the link to "don't need that" makes me want to donate all my rep to you. Sheer poetic beauty. We need more answers like that! – Bernhard Hofmann Sep 05 '11 at 20:41
0

That's a tricky one. I guess something like this?

(<[^>]*>)?(\s)?(American)(\s)?(\</[^>]*>)?(\s)?(<[^>]*>)?(\s)?(Government)(\s)?(</[^>]*>)?
TheCodeKing
  • 19,064
  • 3
  • 47
  • 70
0

I'm not quite sure what you're trying to match. This regex will return American in the 2nd match group and Government in the 3rd match group.

(?ixs)(American)(?:(?!Government).)*(Government)
clarkb86
  • 673
  • 5
  • 21
0

You'll need to reformat your search term into a pattern.

string HighlightSearchTerm( string source, string term )
{
    Regex regex;
    string[] values;
    string pattern;
    values = term.Split( ' ' );
    if ( values.Length > 1 )
    {
        pattern = String.Format(
            "({0})|({1})",
            term,
            String.Join( @")(?=\s*<[^>]+>\s*)(.+?)(", values ) );
    }
    else
    {
        pattern = "(" + term + ")";
    }
    regex = new Regex( pattern );
    return regex.Replace( source, AddTags );
}

And then your MatchEvaluator will need to compensate for variable length groups.

string AddTags( Match match )
{
    string result;

    if ( match.Groups[1].Length > 0 )
    {
        return "<newtag>" + match.Groups[1] + "</newtag>";
    }
    result = "";
    for ( int index = 2; index < match.Groups.Count; index+=2 )
    {
        result += "<newtag>" + match.Groups[index] + "</newtag>" +
            match.Groups[index + 1];
    }
    return result;
}

Input validation and optimization is left as an exercise for the reader. This also won't handle odd scenarios like A<span>merican Government</span>.

Paul Walls
  • 5,884
  • 2
  • 22
  • 23