I am using regex for extracting url from string and it's working mostly;
var regex=new Regex("<a [^>]*href=(?:'(?<href>.*?)')|(?:\"(?<href>.*?)\")",RegexOptions.IgnoreCase);
following strings working fine:
"This is Test page <a href='test.aspx'>test page</a>"
"This is Test page <a href='test1.aspx'>test</a> another one <a href='test2.aspx'>test</a>"
"This is Tests\"s page <a href='test1.aspx'>test</a> another one <a href='test2.aspx'>test</a>"
"This is Test page"
"This is Test page\"s without problem"
But some time it's not returning good result. Following code return bad result (string contains 2 double quotes
) -
var inputString="This string create \"problem\" for me";
var regex=new Regex("<a [^>]*href=(?:'(?<href>.*?)')|(?:\"(?<href>.*?)\")",RegexOptions.IgnoreCase);
var urls=regex.Matches(inputString).OfType<Match>().Select(m =>m.Groups["href"].Value);
foreach(var zzzzzzz in urls){
Console.WriteLine(zzzzzzz);
}
Could anyone help me to solve this problem?