I'm trying to build a simple crawler. The crawler is working fine but, I would like to output some messages inside the recursive function to get a an idea of how many pages that is left to crawl in the $crawling
array and what page that's currently is getting crawled.
Underneath is the relevant code. I have two echo's inside the function but, nothing is outputted before the script finishes. Is it possible to output messages along the way inside a recursive function?
$alreadyCrawled = array();
$crawling = array();
function followLinks($url) {
global $alreadyCrawled;
global $crawling;
echo "Now crawling: $url";
$parser = new DomDocumentParser($url);
$linkList = $parser->getLinks();
// Get the links
for($i = 0; $linkList->length > $i; $i++) {
$href = $linkList->item($i)->getAttribute("href");
// Convert relative links to absolute links
if(strpos($href, "#") !== false) {
continue;
} else if(substr($href, 0, 11) === "javascript:") {
continue;
} else if(substr($href, 0, 6) === "mailto") {
continue;
}
$href = createLink($href, $url);
// Crawl page
if(!in_array($href, $alreadyCrawled)) {
$alreadyCrawled[] = $href;
$crawling[] = $href;
getDetails($href);
}
}
array_shift($crawling); // Remove page just crawled
echo "Finished crawling: $url, Pages left to crawl: " . count($crawling);
// Crawl until array is empty
foreach ($crawling as $site) {
followLinks($site);
}
}