1

I am writing a web crawler in php, and everything was going ok until i tried to get information from secondary pages, now my code works for a few second and then returns a Internal Server Error 500. Can someone tell me why?

$dom = new DOMDocument('1.0');
libxml_use_internal_errors(true);
@$dom->loadHTMLFile($curPage);
        $dom_xpath = new DOMXPath($dom);
        $aElements = $dom_xpath->query("//a[@class='js-publication-title-link ga-publication-item']");
        foreach ($aElements as $element) {
            $href = $element->getAttribute('href');
            if(0 === stripos($href,'publication/')){
                $num = $num+1;

                $publicationNum = $publicationNum+1;
                $spans = $dom_xpath->query(".//span[@class='publication-title js-publication-title']",$element);

                $publicationName = $spans->item(0)->nodeValue;
                $publicationUrl = "https://www.researchgate.net/".$href;

                //Here' where things start to go wrong
                getPublicationData($publicationUrl);

That function receives a url and tries to extract some data from it.

function getPublicationData($url){
    static $seen = array();
    if (isset($seen[$url])) {
        return;
    }
    $seen[$url] = true;

    $dom= new DOMDocument('1.0');
    libxml_use_internal_errors(true);
    $dom->loadHTMLFile($url);
    $dom_xpath = new DOMXPath($dom);

    //metodo 1
    $strongElements = $dom_xpath->query("//strong[@class='publication-meta-type']");
    foreach( $strongElements as $strongElement){
        echo $strongElement->nodeValue;
    }
} 

Then after a few seconds working fine ( I know that its working fine because the code is inside a loop, and it only crashes after a few loops) it returnsa Internal Server Error 500.

edit I already using ini_set('display_errors', 1);and it doenst show me anything :(

0 Answers0