I am writing a web crawler in php, and everything was going ok until i tried to get information from secondary pages, now my code works for a few second and then returns a Internal Server Error 500. Can someone tell me why?
$dom = new DOMDocument('1.0');
libxml_use_internal_errors(true);
@$dom->loadHTMLFile($curPage);
$dom_xpath = new DOMXPath($dom);
$aElements = $dom_xpath->query("//a[@class='js-publication-title-link ga-publication-item']");
foreach ($aElements as $element) {
$href = $element->getAttribute('href');
if(0 === stripos($href,'publication/')){
$num = $num+1;
$publicationNum = $publicationNum+1;
$spans = $dom_xpath->query(".//span[@class='publication-title js-publication-title']",$element);
$publicationName = $spans->item(0)->nodeValue;
$publicationUrl = "https://www.researchgate.net/".$href;
//Here' where things start to go wrong
getPublicationData($publicationUrl);
That function receives a url and tries to extract some data from it.
function getPublicationData($url){
static $seen = array();
if (isset($seen[$url])) {
return;
}
$seen[$url] = true;
$dom= new DOMDocument('1.0');
libxml_use_internal_errors(true);
$dom->loadHTMLFile($url);
$dom_xpath = new DOMXPath($dom);
//metodo 1
$strongElements = $dom_xpath->query("//strong[@class='publication-meta-type']");
foreach( $strongElements as $strongElement){
echo $strongElement->nodeValue;
}
}
Then after a few seconds working fine ( I know that its working fine because the code is inside a loop, and it only crashes after a few loops) it returnsa Internal Server Error 500.
edit
I already using ini_set('display_errors', 1);
and it doenst show me anything :(