How do I get content from a table using its ID with a regex?

Question

I need to sort a html string so I get the content I need. Now I need to loop through the table rows in a table that have an ID. How do I do this with a regex?

see http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Manu, Jan 18 '10 at 10:04

score 1 · Accepted Answer · answered Jan 18 '10 at 09:55

1

Regular expressions cannot be used to parse HTML; HTML is not regular. Use a proper HTML parser library.

answered Jan 18 '10 at 09:55

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

u got any suggestions for this? i use asp.net c# – Dejan.S Jan 18 '10 at 09:57

score 1 · Answer 2 · answered Jan 18 '10 at 10:06

It depends on how regular the HTML text is. For example, given this table:

<table>
  <tr><td>1</td><td>Apple</td></tr>
  <tr><td>2</td><td>Ball</td></tr>
  <tr><td>3</td><td>Cookie</td></tr>
<table>

The following regex expression finds the IDs in the first column:

(?<=<tr><td>).*?(?=</td>)

score 0 · Answer 3 · edited Oct 20 '12 at 07:06

Try this

Dim HTML As String = contentText
Dim options As RegexOptions = RegexOptions.IgnoreCase Or RegexOptions.Singleline
Dim regex As Regex = New Regex("<table[^>]*>(.*)</table>", options)
Dim match As MatchCollection = regex.Matches(HTML)
Dim sb As StringBuilder = New StringBuilder
For Each items As Match In match
    sb.Append(items.ToString & vbLf)
Next
TextBox.Text = sb.ToString

score 0 · Answer 4 · answered Jan 18 '10 at 13:19

0

If you run the page through an html-parser like BeautifulSoup, then you can prettify it so that this kind of regex has a chance. But if you are parsing the html anyway...

answered Jan 18 '10 at 13:19

Charles Stewart

11,661
4
46
85

How do I get content from a table using its ID with a regex?

4 Answers4

Linked