Regex, substring htmlstring

Question

I have a html string to be parsed. ResultsString

         <table id="Table1">
            <tr>
              <td width="50%">
                 Result: <span style="font-weight:bold; color:GREEN;"></span>
               </td>
               <td width="50%">
                  ID: <span style="font-weight:bold;">790043</span>
               </td>
           </table>
         <table id="Table2">
            <tr>
              <td class="name">
                Status:
             </td>
             <td class="value">
                None
             </td>
             </tr>

        </table>
<br /><br />
<a href="#" onclick="$('#vvvv').toggle();return false;" /></a>
<br />
<div id="pp1" style="displa
</div>

How would I extract/substring only the text in the two table tags. So my resuting html string would be

   <table id="Table1">
            <tr>
              <td width="50%">
                 Result: <span style="font-weight:bold; color:GREEN;"></span>
               </td>
               <td width="50%">
                  ID: <span style="font-weight:bold;">790043</span>
               </td>
           </table>
         <table id="Table2">
            <tr>
              <td class="name">
                Status:
             </td>
             <td class="value">
                None
             </td>
             </tr>

        </table>

Please suggest

Thank u

Forget Regex and use the [HTML Agility Pack](http://htmlagilitypack.codeplex.com/) — Matt Burland, Apr 02 '13 at 17:11
[Do not use regex with html](http://stackoverflow.com/a/1732454/580951). Use a html parser instead. — Dustin Kingen, Apr 02 '13 at 17:11

score 0 · Answer 1 · answered Apr 02 '13 at 17:15

0

You want to transform an HTML file? That's an XSLT job.

answered Apr 02 '13 at 17:15

joce

9,624
19
56
74

Julián Urbano · Accepted Answer · 2013-04-02T17:41:50.663

0

As suggested, you should use an HTML parser such as the HTML Agility Pack. Otherwise, you may run into problems if you have nested structures, etc.

For this simple case though, you can use this regular expression:

string html = Regex.Match(ResultsString,
                          @"<table.+<\/table>",
                          RegexOptions.Singleline).Value;

But again, only if your input string is as simple as you showed us!

edited Apr 02 '13 at 17:41

answered Apr 02 '13 at 17:15

Julián Urbano

8,378
1
30
52

Please do not tell beginners to use Regex to parse HTML; it is never appropriate. If the HTML truly is as simple as claimed, then `String.Substring` is adequate. If this is not adequate, then neither is a Regex. – Dour High Arch Apr 03 '13 at 20:02
a) I do explicitly recommend to use a parser instead. b) so when it really is that simple, `Substring` is ok and `Regex` is not? Give me a break – Julián Urbano Apr 03 '13 at 21:07

Regex, substring htmlstring

2 Answers2