0

I am attempting to capture all CSS files in the HTML generated from a Wordpress site.

So far I have the following

preg_match_all('/"([^"]+?\.css)"/', $op, $css);

This gets me all *.css URL's that have no querystring, but as I am sure you are aware, WP has a funny way of appending a querystring to them... so the 20+ rest of the CSS files are not captured.

How can I mod this to grab them all?

Dom parsing is not an option as WP very rarely produces valid html...

Kevin
  • 2,684
  • 6
  • 35
  • 64
  • 1
    I'm not aware of the context, e.g.: what's in $op and $css. But my first try would be `preg_match_all('/"([^"]+?\.css.+(?<!"))"/', $op, $css);` – Alessandro Fazzi Dec 02 '15 at 18:04
  • 1
    I guess you could do a `"([^"]+?\.css[^."]*)"` which might get with/without. –  Dec 02 '15 at 18:28
  • Neither of these do it. Context? Wordpress generated HTML as stated in the question... – Kevin Dec 02 '15 at 19:18
  • 1
    @Kevin - Really? My regex didn't match? Show a sample string you need to match then. As far as I recall `[^"]*"` will match '.css`anything you find here except double quote"`' –  Dec 02 '15 at 19:24
  • I'm not going to post the full HTML that $op ='s Any wordpress site would probably do the trick LOL I wonder if it's that last `"` doing it – Kevin Dec 02 '15 at 19:34
  • Here's an example of what yours is doing: http://kevinpirnie.com/default.php and all I am doing to get the content is a simple curl request to string output for the full HTML (yes verified) – Kevin Dec 02 '15 at 19:35
  • read this! http://stackoverflow.com/questions/18748052/getting-a-all-css-files-of-an-html-web-page – nguaman Dec 02 '15 at 19:38
  • Dom parsing is **not** an option due to the p.poor way wordpress presents it's html for about 80% of the wordpress sites out there. – Kevin Dec 02 '15 at 19:39
  • Believe you me @NelsonGuamanLeiva I wish I could do it like that... unfortunately, as I'm sure you're aware... wp's not the friendliest of beasts to developers :D lol – Kevin Dec 02 '15 at 20:16

1 Answers1

1

If DOM parsing is not an option, consider the following code. You were close:

// just a random css link
$str = "href='/wp-content/themes/optimizePressTheme/lib/js/fancybox/jquery.fancybox.min.css?ver=2.3.4.3'";

// match href literally, then use a named group called css
$regex = "/href=['\"](?P<css>([^'\"]+?\.css)[^'\"]*)/";
preg_match_all($regex, $str, $matches);
print_r($matches["css"]);
// e.g. /wp-content/themes/optimizePressTheme/lib/js/fancybox/jquery.fancybox.min.css?ver=2.3.4.3

Please consider nevertheless using a DOM Parser, it will mostly work with badly formatted HTML as well.

Community
  • 1
  • 1
Jan
  • 42,290
  • 8
  • 54
  • 79
  • Dang close ;) Thanks. Check out items 6 & 7 tho: http://kevinpirnie.com/default.php?_=6 – Kevin Dec 02 '15 at 20:11
  • Will look into it more closely tomorrow. Is your link still valid then? – Jan Dec 02 '15 at 20:39
  • Take your time mate, yeah it should be – Kevin Dec 02 '15 at 20:43
  • @Kevin: I forgot a single quote in the brackets, see my updated answer. – Jan Dec 03 '15 at 07:27
  • Hey @Jan still no go: http://kevinpirnie.com/default.php?_=6 missing the ones in the bottom box – Kevin Dec 03 '15 at 15:25
  • @Kevin Changed the last `+` to a star (`*`), see my updated answer. Assuming, this comes all from your website, I do match 30 css files now including the ones with js_composer. This matches for css files without the query string appended as well (`*` means zero times or more). – Jan Dec 03 '15 at 15:40
  • bingo! Thanks Jan, now onto bigger better(hopefully) things ;) – Kevin Dec 03 '15 at 17:15