preg_match_all for links to JPG/PNG

Question

I tried to find links to JPEG/PNG images. I have search for links without file extension checkm it works OK:

preg_match_all('/<a.+href=[\'"]([^\'"]+).[\'"].*><img/i', $text, $matches);

But now I am trying to add filter for PNG/JPG :( Can you help me?..

Using regex to parse HTML is a [bad idea](https://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not). You can get [strange results](https://stackoverflow.com/a/1732454/2370483) — Machavity, Nov 27 '17 at 18:41
`"I am trying to add filter"` - What specifically *have* you *tried? Was adding `(png|jpg)` somewhere in the regex part of your attempts? (Please note that not knowing the syntax and not researching it are two different things.) — mario, Nov 27 '17 at 18:44
mario, yes, i tried to add it. But I got URLs without extensions in $matches. — Rudomilov, Nov 27 '17 at 18:53

score -1 · Answer 1 · answered Nov 28 '17 at 00:21

Solved by DOMDOcument:

$dom = new DOMDocument();
@$dom->loadHTML($post->post_content);
$dom->preserveWhiteSpace = false;

$images = $dom->getElementsByTagName('img');

foreach ($images as $image) {
    if($image->parentNode->nodeName=='a') {
        print $image->parentNode->getAttribute('href');
    }
}

Tom Friedrichs · Answer 2 · 2023-06-20T20:25:16.070

Here is another contribution that answered the question: How to get url path from images,font,etc in css files?

EDIT: Yes, regex can cause problems, like if you try to replace nested html i.e. However: it depends on the stability requisitions of the code. If you just need a quick solution to get whatever out of a code AND it is working, so why not? If you are doing an application where you can not control the input, maybe it's right better NOT to use regex for this case.

Regex is a quick way, it's just ok for SOME use cases, but not for all. Having said this everyone can chose him/herself, but a general rejection of regex for this seems a little too much.

For MY case I was able to quickly get out the links I needed with this query, but as it seems not everyone likes this solution. After having parsed everything out of my own website, I will discharge the script. But for this purpose it workes well to get out relative and absolute links:

function getPicPath($sSource){
    // returns array with absolute and relative links to pictures.
    preg_match_all('/([-a-z0-9_\/:.]+\.(jpg|jpeg|png))/i', $sSource, $matches);
    return $matches;
}

Here how to process them:

    $aPics = getPicPath($urlcontent);  
    $num_pics = count($aPics[0]);
    foreach ($aPics [0] as &$sTemp2) {
        echo '<br>Count '.$num_pics.' <a href="'.$sTemp2.'"target="_blank">'.$sTemp2.'</a><br>';
    }

It's not for every purpose, but what I do with it is storing them in the database for later converting into webp. Future pics will be converted to webp just when uploaded, but in order to convert previous pictures I need to identify them. And this works perfectly.

preg_match_all for links to JPG/PNG

2 Answers2