0

I have following html, there more td's but have put only few of below. I want to get value of value of TD which has id "hdNumber" through c# code.I want to use regular expression. sometime when come generate html from window live for (email) it may be possible that it render html like "8332 without quotation marks before and after id. I want to get number 8332 only.

<table>
<tr>
    <TD style="COLOR: #666" vAlign=top>
         Good<TD>
       <TD id="hdNumber"
       style="BACKGROUND: white; COLOR: white; DISPLAY: none">8332
    </TD> 
</tr>
</table>
Sandip
  • 49
  • 1
  • 1
  • 8
  • 1
    obligatory [link](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Jonesopolis May 11 '15 at 14:08

2 Answers2

2

Don't use regex to parse HTML. You can use HtmlAgilityPack:

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlString);
var hdNumber = doc.GetElementbyId("hdNumber");
if(hdNumber != null)
{
    string number = hdNumber.InnerText.Trim('\r', '\n', ' ', '"');  // 8332
}

I have used Trim('\r', '\n', ' ', '"') to remove possible leading and trailing spaces, newline characters and quotes as desired.

Community
  • 1
  • 1
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
0

I want to use regular expression.

You can use the following with s modifier or DOTALL if you don't want to use DOM Parser (recommended) :

<TD\s*id\s*=\s*"?hdNumber"?.*?>(.*?)</TD>

And extract number with $1

See DEMO

karthik manchala
  • 13,492
  • 1
  • 31
  • 55
  • thanks karthik,but when I used that in my below c# code it is not working.` const string pattern = @"(.*?)"; Match match in Regex.Matches(text, pattern, RegexOptions.IgnoreCase)` – Sandip May 12 '15 at 05:15
  • @Sandip ..yeah thats because you assumed there will be a space after `id=\s` .. make it `\s*` and it will work.. :) – karthik manchala May 12 '15 at 06:01