I've written a PHP script to parse an RSS feed and try and get the open graph images from the og:image meta tags.
In order to get the images I need to check if the urls in the RSS feed are 301 redirects. This often happens and it means I need to follow any redirects to the resultant URLs. That means the script runs really slowly. Is there a quicker and more efficient way of achieving this?
Here is the function for getting the final URL:
function curl_get_contents($url) {
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$result=curl_exec($ch);
return $result;
}
And this is the function to retrieve the og images (if they exist):
function getog($url) {
$doc = new DomDocument();
$doc->loadHTML(curl_get_contents($url));
if($doc == "") {return;}
$xpath = new DOMXPath($doc);
$query = '//*/meta[starts-with(@property, \'og:\')]';
$queryT = '';
$metas = $xpath->query($query);
foreach ($metas as $meta) {
$property = $meta->getAttribute('property');
$content = $meta->getAttribute('content');
if($property == "og:url" && $ogProperty['url'] == "") {$ogProperty['url'] = $content;}
if($property == "og:title" && $ogProperty['title'] == "") {$ogProperty['title'] = $content;}
if($property == "og:image" && $ogProperty['image'] == "") {$ogProperty['image'] = $content;}
}
return $ogProperty;
}
There is quite a bit more to the script, but these functions are the bottle neck. I'm also caching to a text file, which means it's faster after the first run.
How can I speed up my script to retrieve the final url and get the image urls from the links in the RSS feed?