0
<table class="listProvision" class="itable">
    <tr>
        <td class="whatever">some infos</td>
        <td>some more infos</td>
        <td>13908402</td>
        <td>hello world</td>
    </tr>
    <tr>
        <td class="whatever">some infos</td>
        <td>some more infos</td>
        <td id="num">13908402</td>
        <td>hello world</td>
    </tr>
</table>

Given the above sample HTML, how can I properly parse all existences of <tr>...</tr> between the table with class listProvision?

I tried: <table.*?listProvision.*?>(?:.*?<tr.*?>(.*?)</tr>)+.*?</table>, but I can't figure out what's wrong. There is never going to be any complicated html pulled into this regex so don't worry about that.

Nahydrin
  • 13,197
  • 12
  • 59
  • 101
  • 5
    3000 rep here and you even consider parsing HTML with a Regex -- how have you not seen [this](http://stackoverflow.com/a/1732454/4068)? – Austin Salonen Jan 22 '13 at 23:17
  • 1
    Regular expressions are not the right approach, continue at your [peril](http://stackoverflow.com/a/1732454/67392). – Richard Jan 22 '13 at 23:17
  • 1
    have you looked at using [HTMLAgilityPack](http://www.codeplex.com/htmlagilitypack) also look at some of the suggestions here [Parsing HTML](http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c) – MethodMan Jan 22 '13 at 23:21

2 Answers2

2

Here is sample how you can parse html string with Html Agility Pack

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

var rows = doc.DocumentNode
              .SelectNodes("//table[@class='listProvision']/tr");

Then you can use HtmlNode.InnerHtml property to get all data between <tr>...</tr> tags.

Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
0

1) Use RegexOptions.Singleline to make dot match newline. (your regex works already, I got it work here with just the single-line flag)

2) access match.Groups["yourNamedCaptureGroup"].Captures for your captures.

Community
  • 1
  • 1
Scott Weaver
  • 7,192
  • 2
  • 31
  • 43