Even though I recommend against this approach, you may find this regex helpful:
(?<=href\s*=\s*['"]?)(?>(https?://)?([\da-z\.-]+)\.([a-z\.]{2,6})([/\w\.-]*)*/?)(?<!png|gif|etc)
(Based on URL regex from 8 Regular Expressions You Should Know)
Note that this expression will not allow spaces in the URL. This is because HREF's without quotes will match the following attribute (for example, "domain.com/resource.txt title"
)
EXAMPLE:
static void Main( string[] args )
{
string l_input =
"<a href=\n" +
" \"HTTPS://example.com/page.html\" title=\"match\" />\n" +
"<a href='http://site.com/pic.png' title='do not match'> <a href=domain.com/resource.txt title=match>\n" +
" <script src=scripts.com/script.js>";
foreach ( Match l_match in Regex.Matches( l_input, @"(?<=href\s*=\s*['""]?)(?>(https?://)?([\da-z\.-]+)\.([a-z\.]{2,6})([/\w\.-]*)*/?)(?<!png|gif|etc)", RegexOptions.IgnoreCase ) )
Console.WriteLine( "'" + l_match.Value + "'" );
/*
* Returns:
*
* HTTPS://example.com/page.html
* domain.com/resource.txt
*
*/
Console.ReadKey( true );
}