I need to extract the content tr and td of a 2nd table inside the div from the external url. I can't use HtmlAglitityPack.
Design is something like this:
<div class="class1" id="content-main">
<table width="90%">
<tbody>
<tr><td class="table_left_corner"> </td><td class="table_head">table1 </td><td class="table_right_corner"> </td></tr>
</tbody>
</table>
<table width="90%">
<tbody>
<tr><td class="table_left_corner"> </td><td class="table_head">table2</td><td class="table_right_corner"> </td></tr>
</tbody>
</table>
<table width="90%">
<tbody>
<tr><td class="table_left_corner"> </td><td class="table_head">table3 </td><td class="table_right_corner"> </td></tr>
</tbody>
</table>
</div>
So I want to use some Regex functions to return the content of a table.
using (WebClient client = new WebClient())
{
string htmlcode= client.DownloadString("http://www.example.com");
string r = @"<div.*?id=""content-main"".*?>.*</div>";
Match match2 = Regex.Match(htmlcode, r);
string a = match2.Groups[1].Value;
}
I use different regex expression but all are failed. so please help. how can I get content of a 2nd table.
Edit 2 By using HTMLAglityPack
var web = new HtmlWeb();
var document = web.Load("http://www.example.com/");
var page = document.DocumentNode;
string outerHTML = page.SelectNodes("//table")[5].OuterHtml;
Match match1;
match1 = Regex.Match(outerHTML, @"<a [^>]+>(.*?)<\/a>");
while (match1.Success)
{
string NAme = match1.Groups[1].Value;
var webloc = new HtmlWeb();
dynamic documentloc = null;
documentloc = webloc.Load(urlAddress + NAme.Replace(" ", "-").ToLower());
dynamic pageloc = documentloc.DocumentNode;
string outerHTMLloc = pageloc.SelectNodes("//table")[5].OuterHtml;
match1 = match1.NextMatch();
}
First time it run successfully but when second time come it throws an error on "outerHTMLloc"
Error:"An unhandled exception of type 'System.StackOverflowException' occurred in HtmlAgilityPack.DLL"