0

The following codes does not work, I am trying to retrive TR strings from a HTML table. Is there any issue with this code or any other solution available?

public static List<string> GetTR(string Tr)
{
    List<string> trContents = new List<string>();

    string regexTR = @"<(tr|TR)[^<]+>((\s*?.*?)*?)<\/(tr|TR)>";

    MatchCollection tr_Matches = Regex.Matches(Tr, regexTR, RegexOptions.Singleline);
    foreach (Match match in tr_Matches)
    {
        trContents.Add(match.Value);
    }

    return trContents;
}

Sample input string is given below:

"<TR><TD noWrap align=left>abcd</TD><TD noWrap align=left>SPORT</TD><TD align=left>5AT</TD></TR>"
abatishchev
  • 98,240
  • 88
  • 296
  • 433
Kannan
  • 3
  • 2
  • 5
    Required reading: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 - or in summary: **don't use regex to parse HTML** – Marc Gravell Jan 28 '11 at 15:23

4 Answers4

6

Parsing HTML with regular expressions is asking for trouble.

Do the job properly using something like HTML Agility Pack.

carla
  • 1,970
  • 1
  • 31
  • 44
LukeH
  • 263,068
  • 57
  • 365
  • 409
0

I think this regular expression would be more appropriate:

<(tr|TR)[^>]*>.*<\/\1>
ChaosPandion
  • 77,506
  • 18
  • 119
  • 157
0

this regex matches your input string:

<(tr|TR)+>((\s*?.*?)*?)<\/(tr|TR)>

i removed "[^<]"... not sure why you need that. also, try to add a non-greedy match...

however, it is better to go with something like HTML Agility Pak (if you want to keep your sanity) :)

Mrchief
  • 75,126
  • 20
  • 142
  • 189
0
(<(tr|TR)[^<]*>)(.+)((<\(tr|TR)[^<]*>)
R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
Senad Meškin
  • 13,597
  • 4
  • 37
  • 55