0

I have a html page which contains some anchors, I want to collect them into an array. In those anchors, there are some unwanted anchor like . I don't only choose all anchors, but some of them. The html of anchor looks like this:

<a xmlns="" href="exp3dbasics-c-ExpDesktop-ActionBar.htm#exp3dbasics-c-ExpDesktop-ActionBar">Action Bar</a>

The c# code should be:

protected string[] GetHref(string html)
{
        Regex regex = new Regex("<anchor>([^<]+)</anchor>", RegexOptions.IgnoreCase | RegexOptions.Multiline);

        Match match = regex.Match(html);

        if (match.Success)
        {
            ............
        }

        return ...;
}
Xiufeng Chen
  • 47
  • 1
  • 11
  • I think you must use HtmlAgilityPack or any other C# HTML parser. All other answers will get downvoted (know from experience). Have a look at [my answer](http://stackoverflow.com/a/32371771/3832970) and just amend it a bit (remove `&& attribute.Value.Contains(href_text)`) to get what you want. Or here is [another similar answer](http://stackoverflow.com/a/30461038/3832970). – Wiktor Stribiżew Sep 11 '15 at 14:48
  • @stribizhev thank you, I like your answer, I'll have a try. – Xiufeng Chen Sep 11 '15 at 15:30
  • var nodes = hap.DocumentNode.SelectNodes("//a[@href]"); – Xiufeng Chen Sep 11 '15 at 16:18

0 Answers0