Here is another contribution that answered the question:
How to get url path from images,font,etc in css files?
EDIT: Yes, regex can cause problems, like if you try to replace nested html i.e.
However: it depends on the stability requisitions of the code. If you just need a quick solution to get whatever out of a code AND it is working, so why not? If you are doing an application where you can not control the input, maybe it's right better NOT to use regex for this case.
Regex is a quick way, it's just ok for SOME use cases, but not for all. Having said this everyone can chose him/herself, but a general rejection of regex for this seems a little too much.
For MY case I was able to quickly get out the links I needed with this query, but as it seems not everyone likes this solution. After having parsed everything out of my own website, I will discharge the script. But for this purpose it workes well to get out relative and absolute links:
function getPicPath($sSource){
// returns array with absolute and relative links to pictures.
preg_match_all('/([-a-z0-9_\/:.]+\.(jpg|jpeg|png))/i', $sSource, $matches);
return $matches;
}
Here how to process them:
$aPics = getPicPath($urlcontent);
$num_pics = count($aPics[0]);
foreach ($aPics [0] as &$sTemp2) {
echo '<br>Count '.$num_pics.' <a href="'.$sTemp2.'"target="_blank">'.$sTemp2.'</a><br>';
}
It's not for every purpose, but what I do with it is storing them in the database for later converting into webp. Future pics will be converted to webp just when uploaded, but in order to convert previous pictures I need to identify them. And this works perfectly.