-3

Image this table:

<table cellpadding="0" border="0">
<tr class="someclass">

<td>blah blah THISISIMPORTANT blah blah</td>

</tr>
</table>

I want to select only the tables that have TDs with the innerHTML that contains 'THISISIMPORTANT'.

This must be done with regular expressions in c#

this is what i have tried:

<table\s*.*?\s*>\s*.*?\s*<td\s*.*?\s*>\s*.*?\s*</td>\s*.*?\s*</table>
Ashkan Mobayen Khiabani
  • 33,575
  • 33
  • 102
  • 171

2 Answers2

2
/<table[^>]*>(?:.(?!<\/table>))*<td[^>]*>(?:.(?!<\/td>))*THISISIMPORTANT.*?<\/td>.*?<\/table>/

That's close... I mean, so long as no one uses a ">" inside a tag, you're fine. But you really should find a better way to do it than regex.

FrankieTheKneeMan
  • 6,645
  • 2
  • 26
  • 37
0

Why not just use the HTML Agility Pack? It parses HTML really well and even supports LINQ, so it should be trivial to implement what you want with it.

You could parse the text you want with a regular expression, but then you'll need to assume will always be perfectly formed and in the same specific format, etc., which will make the code difficult to maintain.

EDIT : I found another question which is nearly identical to yours, with a code sample showing how to use HTML agility pack to implement a solution: regex to get value of inside a particular TD in HTML

Community
  • 1
  • 1
Jack P.
  • 11,487
  • 1
  • 29
  • 34
  • I'm developing an asp.net page and i can't use any dll files (other than the .net framework itself) the web hosting forbided use of dll files – Ashkan Mobayen Khiabani Sep 01 '12 at 00:28
  • 1
    @AshkanMobayenKhiabani Find a better web host. Seriously. If you can't drop a DLL in the `/bin` dir of your ASP.NET site, your web host is *useless*. – Andrew Barber Sep 01 '12 at 06:05