How can i escape quotes from a string?

Question

I have this for example:

<a href="/Forums2008/forumPage.aspx?forumId=393" title="מזג האוויר">מזג האוויר</a>

What i want to parse is first the forumId=393 then only the 393 and the link and last the name in this case hebrew so it's a bit mess here the name should be:

מזג האוויר

I can use either indexof and substring or htmlagilitypack i prefer htmlagilitypack to get all three values maybe regex is also good way.

In the end i should get this four strings:

forumId=393
393
מזג האוויר
/Forums2008/forumPage.aspx?forumId=393

What i tried so far and it's not even close to my goal is once with htmlagilitypack and the other with downloading the html save it as file and then parsing it i thought using indexof and substring but not sure how:

HtmlAgilityPack.HtmlDocument doc =
                        Qhw.Load("http://www.tapuz.co.il/forums/forumslistnew.asp");
parseIds(doc);

WebClient webclient = new WebClient();
webclient.DownloadFile("http://www.tapuz.co.il/forums/forumslistnew.asp",
                        @"c:\testhtml\mainforums.html");
webclient.Dispose();

string[] lines = File.ReadAllLines(@"c:\testhtml\mainforums.html");
foreach(string line in lines)
{
    if (line.Contains("href") && line.Contains("forumId=") && !wholeids.Contains(line))
    {
        string tg1 = "href="";
        wholeids.Add(line);
    }
}
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{   
    idsnumbers.Add(link.InnerText);
}

idsnumbers is List global var.

You should mention what about your pattern is constant. e.g. is there always `forumPage.aspx?forumId=`? — Rotem, Oct 06 '15 at 09:29
Rotem yes each line like this is the same format the only thing that change is the name and the id number. — Daniel van wolf, Oct 06 '15 at 09:30
Good suggestion for HtmlAgilityPack. Don't use Regex to parse HTML. I should include the obligatory link: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Russ Clarke, Oct 06 '15 at 09:31
@Russ Every time that link is posted I read it all over again :) — Rotem, Oct 06 '15 at 09:34
@TimBiegeleisen aren't you [a day late](https://www.google.co.uk/search?q=Simchas+Torah&ie=utf-8&oe=utf-8&gws_rd=cr&ei=QZYTVr7MEMXxasnnrtAJ) — Jamiec, Oct 06 '15 at 09:37
@Jamiec Here in Singapore (in the diaspora) today is actually Simchat-Torah. We are now in the final hours of this chag. Holidays in Israel are only 1 day :-) — Tim Biegeleisen, Oct 06 '15 at 09:38
What on earth does this question have to do with "escape quotes from a string"? — kjbartel, Oct 06 '15 at 09:39

Tim Schmelter · Accepted Answer · 2015-10-06T09:50:22.270

I would use HtmlAgilityPack, Uri.TryCreate and ParseQueryString:

string html = @"<a href=""/Forums2008/forumPage.aspx?forumId=393"" title=""מזג האוויר"">מזג האוויר</a>";
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(html);
var anchor = htmlDoc.DocumentNode.Descendants("a").FirstOrDefault();
if(anchor != null)
{
    string name = anchor.InnerText;
    string href = anchor.Attributes["href"].Value;
    Uri uri;
    if(Uri.TryCreate(href, UriKind.RelativeOrAbsolute, out uri))
    {
        var queryString = href.Substring(href.IndexOf('?')).Split('#')[0]; // because of relative uri
        var queryKeyValues = System.Web.HttpUtility.ParseQueryString(queryString);
        string forumId = queryKeyValues["forumId"];
    }
}

You could also create a fake absolute uri to avoid the string methods:

if(Uri.TryCreate(href, UriKind.RelativeOrAbsolute, out uri))
{
    if(!uri.IsAbsoluteUri)
        uri = new Uri(new Uri("http://www.google.com/"), uri);
    var queryKeyValues = System.Web.HttpUtility.ParseQueryString(uri.Query);
    string forumId = queryKeyValues["forumId"];
}

How can i escape quotes from a string?

1 Answers1