-3

I need a regex to extract all the stylesheets (<link> tags) from a html document.

Currently i have preg_match_all('/<link([^>]*?)>/i',..., and that regex extracts the stylesheets... that's fine.

But, I need to exclude the styles wrapped in IEs conditional tags <!--[if IE...]>bla bla<![endif]-->...

Any tips on how to do that?

mario
  • 144,265
  • 20
  • 237
  • 291
Andrej
  • 736
  • 2
  • 14
  • 35

1 Answers1

1

Use DOM and XPath for that

$dom = new DOMDocument;
$dom->loadHTMLFile('http://example.com');
$xpath = new DOMXPath($dom);
$stylesheets = $xpath->query('/html/head/link[@rel="stylesheet"]');
foreach ($stylesheets as $stylesheet) {
    echo $dom->saveHtml($stylesheet);
}

This will only print the stylesheets in the head element excluding those inside comment nodes. And if you need to limit that even further, for instance by media attribute then simply add that as another condition to the Xpath query.

In case saveHTML doesnt accept a node in your version of PHP see

Community
  • 1
  • 1
Gordon
  • 312,688
  • 75
  • 539
  • 559