1

I'm getting this below value while scraping , i just need only 28 from this string

<td colspan="2" class="invalid">
  28 Errors, 3 warning(s)


</td>

my code

string strurl = "http://validator.w3.org/check?uri=" + url + "";
StreamReader stream = objm.URLServerRequest(strurl)
 string myResponse = stream.ReadToEnd();
MatchCollection AltTag = Regex.Matches(myResponse, @"(?si)<td\b[^<]*?>(.*?)</td>", RegexOptions.IgnoreCase | RegexOptions.Multiline);
John Saunders
  • 160,644
  • 26
  • 247
  • 397

1 Answers1

0

In order to get the contents of TD node, you need to use td in the regex:

var myResponse = "<td class=\"att\">Text</td><td class=\"att\">Text2</td>";
var TdTag = Regex.Matches(myResponse, @"(?si)<td\b[^<]*?>(.*?)</td>");
var results = TdTag.Cast<Match>().Select(m => m.Groups[1].Value).ToList();

This should return all TD contents as a string list.

enter image description here

EDIT:

To capture 28 in the new input, you need to use a modified regex:

var myResponse = @"<td colspan=""2"" class=""invalid"">
28 Errors, 3 warning(s)


</td>";
var TdTag = Regex.Matches(myResponse, @"(?s)<td[^<]*>\s*(\d+)[^<]*</td>");
var result = TdTag[0].Groups[1].Value;
Console.WriteLine(result);

Output of the demo program:

28
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563