I created regex which gives image url from the source code of the page.
<?php
function get_logo($html, $url)
{
//preg_match_all('', $html, $matches);
//preg_match_all('~\b((\w+ps?://)?\S+(png|jpg))\b~im', $html, $matches);
if (preg_match_all('/\bhttps?:\/\/\S+(?:png|jpg)\b/', $html, $matches)) {
echo "First";
return $matches[0][0];
} else {
if (preg_match_all('~\b((\w+ps?://)?\S+(png|jpg))\b~im', $html, $matches)) {
echo "Second";
return url_to_absolute($url, $matches[0][0]);
//return $matches[0][0];
} else
return null;
}
}
But for wikipedia page image url is like this
http://en.wikipedia.org/wiki/File:Nelson_Mandela-2008_(edit).jpg
which always fails in my regex.
How can I get rid of this?