0

I'm trying to build a simple crawler. The crawler is working fine but, I would like to output some messages inside the recursive function to get a an idea of how many pages that is left to crawl in the $crawling array and what page that's currently is getting crawled.

Underneath is the relevant code. I have two echo's inside the function but, nothing is outputted before the script finishes. Is it possible to output messages along the way inside a recursive function?

$alreadyCrawled = array();
$crawling = array();

function followLinks($url) {
    global $alreadyCrawled;
    global $crawling;

    echo "Now crawling: $url";

    $parser = new DomDocumentParser($url);
    $linkList = $parser->getLinks();

    // Get the links
    for($i = 0; $linkList->length > $i; $i++) {
        $href = $linkList->item($i)->getAttribute("href");

        // Convert relative links to absolute links
        if(strpos($href, "#") !== false) {
            continue;
        } else if(substr($href, 0, 11) === "javascript:") {
            continue;
        } else if(substr($href, 0, 6) === "mailto") {
            continue;
        }

        $href = createLink($href, $url);

        // Crawl page
        if(!in_array($href, $alreadyCrawled)) {
            $alreadyCrawled[] = $href;
            $crawling[] = $href;

            getDetails($href);
        }

    }

    array_shift($crawling); // Remove page just crawled

    echo "Finished crawling: $url, Pages left to crawl: " . count($crawling);    

    // Crawl until array is empty
    foreach ($crawling as $site) {
        followLinks($site);
    }

}
Rajohan
  • 1,411
  • 2
  • 10
  • 27

1 Answers1

0

After looking at nandal's answer and CBroe's link to a possible duplicate I ended up with the function underneath. Calling it after each echo does the trick.

function flush_buffers(){
    ob_end_flush();
    ob_flush();
    flush();
    ob_start();
}
Rajohan
  • 1,411
  • 2
  • 10
  • 27