0

I have a website status checker that writes the latest urls checked to a log file (url, status e.g. up or down and date checked), trouble i'm now finding is that it also records spider/Google bot visits, so latest site checks are being written multiple times per second...

Here is my log writing function:

public function log($url, $status) {
    if (strpos($url, "/") !== false):
        if (strpos($url, "http://") === false):
            $url = "http://" . $url;
        endif;
        $parse = parse_url($url);
        $url = $parse['host'];
    endif;
    if (!empty($url)):
        $arrayToWrite = array(
            array(
                "url" => $url,
                "status" => $status,
                "date" => date("m/d/Y h:i")
            )
        );
        if (file_exists($this->logfile)):
            $fileContents = file_get_contents($this->logfile);
            $arrayFromFile = unserialize($fileContents);
            foreach ($arrayFromFile as $k => $tmpArray):
                if ($tmpArray['url'] == $url):
                    unset($arrayFromFile[$k]);
                endif;
            endforeach;
            if (is_array($arrayFromFile)):
                array_splice($arrayFromFile, 9);
                $arrayToWrite = array_merge($arrayToWrite, $arrayFromFile);
            endif;
        endif;
        file_put_contents($this->logfile, serialize($arrayToWrite));
    endif;
}

What type of amendments could I make so it ignores bots/spider visits please so it only tracks/writes real visitors?

user2736203
  • 195
  • 1
  • 2
  • 9
  • if the IP checks for `robots.txt` then it's *probably* a spider. And for sure any reputable spider will check `robots.txt`. Also fairly sure they use a proper header string. – Wayne Werner Mar 01 '17 at 17:05
  • True, but I want spiders, spidering/visiting my content, just not my script to write those visits to the url-checker log file – user2736203 Mar 01 '17 at 17:11

1 Answers1

0

Refrencing this answer: how to detect search engine bots with php?

You can use $_SERVER['HTTP_USER_AGENT'] to check if the visitor identifies as a spider.

$bots = array("googlebot", "msn", "add other bots");
if(in_array(strtolower($_SERVER['HTTP_USER_AGENT']), $bots)){
     // Don't save url
}

A List of Spiders

Community
  • 1
  • 1
kyeiti
  • 376
  • 2
  • 6