Prevent bots from inflating my reads/hit count?

Question

I think I'm having a problem with bots and crawlers inflating my reads count (basically a hit counter on a blog post that goes up +1 each refresh).

Is there any way I can filter out the bots and crawlers? I thinking maybe using $_SERVER['HTTP_USER_AGENT'] to filter with but I'm not sure how to go about this or if it would even work

Or even if anyone has any better ideas...

its not very reliable, but yes you could use $_SERVER['HTTP_USER_AGENT'] — , Jul 04 '18 at 21:19
You should store the IP address in a temporary database, then it will affect the counter for each IP address only once in 24 hours for example. — HTMHell, Jul 04 '18 at 21:20
Possible duplicate of [how to detect search engine bots with php?](https://stackoverflow.com/questions/677419/how-to-detect-search-engine-bots-with-php) — , Jul 04 '18 at 21:22
It wouldnt work as i have lots of different posts with a read counter, so each would need its own individual ip storage. — FoxyFish, Jul 04 '18 at 21:23
Ofcourse it would. A great database for this kind will be Redis. You can store something like that: `view:{post_id}:{ip_address}`, then increase your counter only if this key doesn't exist. — HTMHell, Jul 04 '18 at 21:25
A. IP does not equal person(or bot) (1 person many IP's one IP thousands of people), B this would still count bot hits — , Jul 04 '18 at 21:29
Yeah, it would limit the bots to 1 hit per day, but i want to eliminate them from hitting at all if possible, and i don't want to limit a regular reader. — FoxyFish, Jul 04 '18 at 21:30
if you decide to use user agent, know that some bots lie, and new bots show up all the time so keeping track of user agents wont be trivial. I dont think many people would ever bother to do this. — , Jul 04 '18 at 21:32
I was thinking doing the opposite. Instead of a huge ban list of bots, having a whitelist of allowed and only allow those. But again i wasnt sure if that would work or what all the genuine user agents are. I cant think of any alternatives to user agent to deal with this though? — FoxyFish, Jul 04 '18 at 21:35
https://github.com/matomo-org/device-detector is a fantastic device detection which includes a `$dd->isBot()` function if the overhead of adding this to your project is tolerable. This package is part of Matomo/Piwik Analytics, but functions perfectly on it's own. — Scuzzy, Jul 04 '18 at 22:06

Eric · Answer 1 · 2018-07-05T00:24:56.380

1

You could use this trick, to check if the browser actually has cookies and javascript enabled, most bots don't, but most bots do fake a valid user agent.

 $browser = get_browser(null, true);
 if($browser['javascript'] !== 1 || $browser['cookies'] !== 1){
      //probably a bot
 }

Another way to do it that might also fail is to check if a session has been started. Many bots, as they don't accept cookies or have cookies enabled, won't then start a session (due to missing cookies data in header).

 if(!$_SESSION){//bot probable}

or even check for a session variable you would set at the beginning of a session

 if(!isset($_SESSION['your_var'])){ //bot probable}

edited Jul 05 '18 at 00:24

answered Jul 04 '18 at 21:47

Eric

9,870
14
66
102

Thanks. Thats a pretty good idea so long as bots don't start using cookies and js. – FoxyFish Jul 04 '18 at 21:50
wont work for all bots. but i think the OP knows nothing is going to work 100% – Jul 04 '18 at 21:51
Yeah, i realise there is zero chance of a definitive bot preventer, but if it stops a good chuck of them thats good enough, its only a counter at the end of the day, but a slightly truer representation is better than way overinflated. – FoxyFish Jul 04 '18 at 21:53
Cant add it, shared hosting. – FoxyFish Jul 04 '18 at 23:14

Prevent bots from inflating my reads/hit count?

1 Answers1