0

I need to get a list of all image files referenced in my HTML, CSS and JavaScript files.

Here are some examples of what I will find inside my files:

CSS:
ul li {
    list-style-image: url('data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7');
}

#insert { background-image: url('../img/insert.jpg'); }
#delete { background-image: url('../img/delete.png'); }

HTML:
<link rel="icon" sizes="192x192" href="touch-icon-192x192.png">
<img id="home" src="img/home.png" class="img-home">

JavaScript:
"BackgroundImageUrl": "textures/glass.jpg"

Using https://regex101.com/ I came up with following expression:

/[\"'](.*(png|jpg|gif))[\"']?/ig

but I am including also base64-encoded files which I don't need, and moreover in my HTML matches there are also some unnecessary parts, for example:

"icon" sizes="192x192" href="touch-icon-192x192.png"

whereby I need just only touch-icon-192x192.png.

How can I parse my files with PHP and get a clean list of my referenced png, gif and jpeg images? Are regex good for this, or is there a better way to accomplish such a task in PHP?

EDIT:

The accepted answer here: How do you parse and process HTML/XML in PHP? is a collection of software libraries and other off-site resources, whereby what I am asking here is a programming related question, about regex.

deblocker
  • 7,629
  • 2
  • 24
  • 59

1 Answers1

1

Here is a way to do the job:

$input = <<<EOD
CSS:
ul li {
    list-style-image: url('data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7');
}

#insert { background-image: url('../img/insert.jpg'); }
#delete { background-image: url('../img/delete.png'); }

HTML:
<link rel="icon" sizes="192x192" href="touch-icon-192x192.png">
<img id="home" src="img/home.png" class="img-home">

JavaScript:
"BackgroundImageUrl": "textures/glass.jpg"
EOD;

preg_match_all('/(?<=["\'])[^"\']+?\.(?:jpe?g|png|gif)(?=["\'])/', $input, $m);
print_r($m);

Output:

Array
(
    [0] => Array
        (
            [0] => ../img/insert.jpg
            [1] => ../img/delete.png
            [2] => touch-icon-192x192.png
            [3] => img/home.png
            [4] => textures/glass.jpg
        )

)

Explanation:

(?<=["\'])          : lookbehind, make sure we have a quote before
[^"\']+?            : 1 or more any character that is not a quote
\.                  : a dot
(?:jpe?g|png|gif)   : non capture group, list of image extensions
(?=["\'])           : lookahead, make sure we have a quote after
Toto
  • 89,455
  • 62
  • 89
  • 125
  • This is PERFECT!!! As I am trying to understand, could You please kindly explain me exactly, how do You get the first quote\double-quote before the image extension? I had trouble with that... – deblocker Jul 05 '18 at 14:21
  • @deblocker: You'll find usefull informations here: https://www.regular-expressions.info/lookaround.html – Toto Jul 05 '18 at 14:47