0

I'm using the following Regex (which I found online) to obtain the urls within a HTML page;

        Regex regex = new Regex(@"url\((?<char>['""])?(?<url>.*?)\k<char>?\)");

Works fine for the HTML below;

<div style="background:url(images/logo.png) no-repeat;">UK</div>

However returns more than I need when the HTML page contained the following Javascript, returning 'destpage'

function buildurl(destpage) 

I tried the following regex to include a colon, but it appears to be invalid

:url\((?<char>['""])?(?<:url>.*?)\k<char>?\)

Any help would be much appreciated.

saj
  • 4,626
  • 2
  • 26
  • 25

2 Answers2

3

To get all the URLs, use the HtmlAgilityPack instead of a Regex. From their example page

HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
{

}

You can expand on that to obtain your style urls by, for example, using //@style to get the style nodes and iterating through those to extract the url value.

keyboardP
  • 68,824
  • 13
  • 156
  • 205
0

Only add the colon to the front:

:url\((?<char>['""])?(?<url>.*?)\k<char>?\)

The second "url" is the name of that group.

user2586804
  • 321
  • 1
  • 10