0

I have a requirement where I need to get the link only from this HTML

"<span class=""name""><a href=Details.aspx?entityID=1&hash=20&searchFunctionID=53b&type=Advanced&nameSet=Entities&q=a&textSearchType=ExactPhrase&orgTypes=01%2c02%2c03%2c04%2c05%2c06%2c07%2c08%2c09%2c10%2c11%2c12%2c13%2c14%2c15%2c16%2c90%2c96%2c98%2c99> GOOGLE CORPORATION  </a> </span>  <br /> <span class=typeDescription>  09  -  Analytics Company </span>"

The output I need is

Details.aspx?entityID=1&hash=20&searchFunctionID=53b&type=Advanced&nameSet=Entities&q=a&textSearchType=ExactPhrase&orgTypes=01%2c02%2c03%2c04%2c05%2c06%2c07%2c08%2c09%2c10%2c11%2c12%2c13%2c14%2c15%2c16%2c90%2c96%2c98%2c99

I used

string sPattern ="[<a href=](.*?(99))";
MatchCollection mcMatches = Regex.Matches(input,sPattern);
foreach (Match m in mcMatches)
{
   Console.WriteLine(m.Value);
}

This is not giving me right output. Can anyone point me in the right direction.

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
user2726975
  • 1,285
  • 3
  • 17
  • 26
  • 3
    Please don't parse HTML with RegEx. – germi Dec 16 '13 at 14:43
  • Parsing HTML with Regex? [Bad idea!](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) Instead, why not use proper HTML parser, like [Html Agility Pack](http://htmlagilitypack.codeplex.com/)? – MBender Dec 16 '13 at 14:43

2 Answers2

6

As suggested above, parsing HTML with Regex is not very good idea. I recommend you to use HtmlAgilityPack (you can get it from NuGet):

HtmlDocument hdoc = new HtmlDocument();            
hdoc.LoadHtml(@"<span class=""name""><a href=Details.aspx?entityID=1&hash=20&searchFunctionID=53b&type=Advanced&nameSet=Entities&q=a&textSearchType=ExactPhrase&orgTypes=01%2c02%2c03%2c04%2c05%2c06%2c07%2c08%2c09%2c10%2c11%2c12%2c13%2c14%2c15%2c16%2c90%2c96%2c98%2c99> GOOGLE CORPORATION  </a> </span>  <br /> <span class=typeDescription>  09  -  Analytics Company </span>");
var href = hdoc.DocumentNode.SelectSingleNode("//a").Attributes["href"].Value;

It gives you value of href attribute.

carla
  • 1,970
  • 1
  • 31
  • 44
Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
0

As Shaamaan said, Regex is not the right way to parse HTML, for your example given, a better Regex would be, although no guarantee it will always work:

(?:<a href=)([^">]*)
Scoregraphic
  • 7,110
  • 4
  • 42
  • 64