How can I parse a complete HTML website in C#
Little Example
<html>
<head></head>
<body>
<div class="wrapper">
<div class="row">
<div>Value1</div>
<div>Value2</div>
</div>
<div class="row">
<div>Value1</div>
<div>Value2</div>
</div>
<div class="row">
<div>Value1</div>
<div>Value2</div>
</div>
<div class="row">
<div>Value1</div>
<div>Value2</div>
</div>
</div>
</body>
</html>
I can not use the classes of the page to ident. the container, because they are variable.
Now I want to save the values.
My Code now:
WebBrowser wb = (WebBrowser)sender;
var doc = wb.Document as HTMLDocument;
IHTMLElementCollection nodes = doc.getElementsByTagName("div");
foreach(IHTMLElement elem in nodes)
{
var div = (HTMLDivElement)elem;
if(div.className != null && div.className.Contains("t_row"))
{
//BREAKPOINT
var inner = div.document as HTMLDocument;
IHTMLElementCollection innerNode = inner.getElementsByTagName("div");
log(div.innerText);
}
}
Till the breakpoint everything works fine, but till there I dont know how I need to go on.