I am collecting list of all urls from web page. My issue is, the list contains all images also which I dont want in my list of URLs.
This script gives me all link from web page.
function getUrl($html)
{
$regex = '/\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i';
preg_match_all($regex, $html, $matches);
$urls = $matches[0];
return $urls;
}
Here is the regex to get image from source code.
/\bhttps?:\/\/\S+(?:png|jpg)\b/
How can I exclude image from list of extracted URLs?
UPDATE
$regex = '/(?!.*(?:\.jpe?g|\.gif|\.png)$)\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i';
preg_match_all($regex, $html, $matches);
$urls = $matches[0];
why this regex still could not exclue image?