I have this for example:
<a href="/Forums2008/forumPage.aspx?forumId=393" title="מזג האוויר">מזג האוויר</a>
What i want to parse is first the forumId=393 then only the 393 and the link and last the name in this case hebrew so it's a bit mess here the name should be:
מזג האוויר
I can use either indexof and substring or htmlagilitypack i prefer htmlagilitypack to get all three values maybe regex is also good way.
In the end i should get this four strings:
forumId=393
393
מזג האוויר
/Forums2008/forumPage.aspx?forumId=393
What i tried so far and it's not even close to my goal is once with htmlagilitypack and the other with downloading the html save it as file and then parsing it i thought using indexof and substring but not sure how:
HtmlAgilityPack.HtmlDocument doc =
Qhw.Load("http://www.tapuz.co.il/forums/forumslistnew.asp");
parseIds(doc);
WebClient webclient = new WebClient();
webclient.DownloadFile("http://www.tapuz.co.il/forums/forumslistnew.asp",
@"c:\testhtml\mainforums.html");
webclient.Dispose();
string[] lines = File.ReadAllLines(@"c:\testhtml\mainforums.html");
foreach(string line in lines)
{
if (line.Contains("href") && line.Contains("forumId=") && !wholeids.Contains(line))
{
string tg1 = "href="";
wholeids.Add(line);
}
}
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
idsnumbers.Add(link.InnerText);
}
idsnumbers is List global var.