When parsing HTML, you should consider using some HTML parser, like HtmlAgilityPack, and only after getting the necessary node, apply the regex on the plain text.
If you want to debug your own code, here is a fix:
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var s = "<a href=\"/url?q=https://www.google.com/&sa=U&ved=0ahUKEwizwPy0yNHSAhXMDpAKHec7DAsQFgh6MA0&usg=AFQjCNEjJILXPMMCNAlz5MN1IIzjpr79tw\">";
var pattern = @"<a href=""/url\?q=(.*?)&";
var result = Regex.Match(s, pattern);
if (result.Success)
Console.WriteLine(result.Groups[1].Value);
}
}
See a DotNetFiddle demo.
Here is an example how how you may extract all <a>
href attribute values that start with /url?q=
with HtmlAgilityPack. Install it via Solution > Manage NuGet Packages for Solution... and use
public List<string> HapGetHrefs(string html)
{
var hrefs = new List<string>();
HtmlAgilityPack.HtmlDocument hap;
Uri uriResult;
if (Uri.TryCreate(html, UriKind.Absolute, out uriResult) && uriResult.Scheme == Uri.UriSchemeHttp)
{ // html is a URL
var doc = new HtmlAgilityPack.HtmlWeb();
hap = doc.Load(uriResult.AbsoluteUri);
}
else
{ // html is a string
hap = new HtmlAgilityPack.HtmlDocument();
hap.LoadHtml(html);
}
var nodes = hap.DocumentNode.SelectNodes("//a[starts-with(@href, '/url?q=')]");
if (nodes != null)
{
foreach (var node in nodes)
{
foreach (var attribute in node.Attributes)
if (attribute.Name == "href")
{
hrefs.Add(attribute.Value);
}
}
}
return hrefs;
}
Then, all you need is apply a simpler regex or a couple of simpler string operations.