0

I want to parse HTML with regular expression. Here is the html source code:

<table border="1">
    <tr>
        <td>row 1, cell 1</td>
        <td>row 1, cell 2</td>
    </tr>
    <tr>
        <td>row 2, cell 1</td>
        <td>row 2, cell 2</td>
    </tr>
</table>

And I want to take innerHTML of TD tags in a collection.

PS: I don't know how many TD tags will be exist in the html source. I think this can be done with REGEX GROUPING. Please add explanations in your answer about grouping and your code.

Thanks in advance....

uzay95
  • 16,052
  • 31
  • 116
  • 182
  • 3
    Probably easier to parse as XML rather than using Regex (somebody will no doubt post a link to the "DONT USE REGEX TO PARSE HTML" answer) – Richard Dalton Nov 15 '11 at 11:21
  • 2
    You really want to think twice, no, three times about parsing HTML with a regex. See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454. – Jeremy McGee Nov 15 '11 at 11:22
  • ..and a good follow-up might be that you should look into using [Html Agility Pack](http://htmlagilitypack.codeplex.com/) instead (also available via Nuget). Here is an example: [HTMLAgilityPack parse in the InnerHTML](http://stackoverflow.com/questions/1346144/htmlagilitypack-parse-in-the-innerhtml). – Fredrik Mörk Nov 15 '11 at 11:24

1 Answers1

3

Regex is a search tool, and is not suitable for parsing HTML (or any programming language for that matter) If you ever want to parse HTML HTML Agility Pack is probably the way to go.

GeirGrusom
  • 999
  • 5
  • 18