Detect if visitor is a browser and not a crawler

Question

How would I server side detect if the user of the webpage is a browser?

The reason for detecting this is, that I'm storing statistics about visited pages and I don't want to store any statistics when the user is a crawler. So I'm not trying to detect which browser vendor. I only want a boolean answer, is the user a browser.

I assume the answer is connected with the user-agent header, but the numerous possibly values is too overwhelming for me to figure out. How can I detect it? A 90% solution where I only detect the most Top5 popular browser would be good enough.

I'm using C# ASP.Net, but I guess most solutions in other languages and frameworks could be translated.

As far as I understand, a user is a human, which is using a browser. So a user is never a browser. Do you mean how you can detect which browser users are using? — simon, Oct 08 '09 at 12:50
"I am not a browser, I'm a free man!" user->get (coat); user->wear (coat); user->exit (room); — Skizz, Oct 08 '09 at 12:56
See "Detecting honest web crawlers" at http://stackoverflow.com/questions/544450/detecting-honest-web-crawlers — Arjan, Oct 08 '09 at 13:08

Rowan · Accepted Answer · 2012-02-26T12:29:02.967

6

This code is by no means exhaustive, but gives you a basic founding in PHP. I can't guarantee a few won't slip through the net, but this should catch most browsers and ignore most bots.

<?php
// Regular expression to match common browsers
$browserlist = '/(opera|aol|msie|firefox|chrome|konqueror|safari|netscape|navigator|mosaic|lynx|amaya|omniweb|avant|camino|flock|seamonkey|mozilla|gecko)+/i';

// Test for browsers
if (preg_match($browserlist, $_SERVER['HTTP_USER_AGENT'])) {
    // ...is a browser
} else {
    // ...is not a browser
}
?>

edited Feb 26 '12 at 12:29

answered Oct 08 '09 at 12:58

Rowan

5,597
2
22
32

should there not be an 'i' at the end of the preg match to make it case insensitive? – zaf Apr 21 '11 at 08:54
Very good point! I missed that as I normally `strtolower` the user-agent first.. just fixing – Rowan Apr 25 '11 at 10:59
3

I don't think this will work, all user agents starts with Mozilla (long and sad story), including many crawlers. – Bill Yang Feb 28 '12 at 17:09
You'd be correct if we were trying to single out individual browsers, but we're filtering bots. The small amount of research I did for this answer showed that little-to-no bots have "mozilla" in their user-agent string. If you have a list of examples, I'll be more than happy to change the answer – Rowan Mar 06 '12 at 08:16
If you head to http://www.user-agents.org/ there are a number of examples of bots having Mozilla (here's one... http://www.user-agents.org/) – BlueFish Feb 13 '13 at 04:17
I am really late to the party but ran into this, is there a way nowaydays, four years later, where we can do this without sniffing the user agent? – Henk Jansen Nov 15 '13 at 12:24

score 1 · Answer 2 · edited May 23 '17 at 10:26

1

this will give you 99% accuracy.

Regex.IsMatch(Request.UserAgent, @"bot|crawler|baiduspider|80legs|ia_archiver|voyager|curl|wget|yahoo! slurp|mediapartners-google", RegexOptions.IgnoreCase);

I've listed more solutions in my answer here:

Detecting honest web crawlers

edited May 23 '17 at 10:26

Community

1
1

answered Jan 26 '13 at 10:30

Dave Sumter

2,926
1
21
29

Donut · Answer 3 · 2009-10-08T12:56:42.010

1

If you're using PHP, try $_SERVER['HTTP_USER_AGENT'] or the get_browser() function.

Here's a class that's available for download that makes this process incredibly easy. Example usage:

$browser = new Browser();
if( $browser->getBrowser() == Browser::BROWSER_FIREFOX && $browser->getVersion() >= 2 ) {
 echo 'You have FireFox version 2 or greater';
}

edited Oct 08 '09 at 12:56

answered Oct 08 '09 at 12:50

Donut

110,061
20
134
146

Thats a really good class. Easy to integrate and use, and good features. Really useful, I'm using it to detect incoming Internet Crawlers Robots, and gets the job done pretty easily. – Fábio Antunes Mar 31 '10 at 12:05

score 1 · Answer 4 · answered Oct 08 '09 at 13:02

1

I am not sure why you are doing this.

However, you do not want to even pretend you can tell between a human versus a robot based on the User-agent header. You might be able to get some success analyzing the pattern and timing of requests.

answered Oct 08 '09 at 13:02

Sinan Ünür

116,958
15
196
339

score 0 · Answer 5 · answered Oct 08 '09 at 12:53

0

If you need to decide whether to return (say) raw text for a script versus HTML for a browser, perhaps you should inspect the Accept-Encoding request header?

answered Oct 08 '09 at 12:53

Brian Agnew

268,207
37
334
440

score 0 · Answer 6 · answered Oct 08 '09 at 12:54

0

Perl has a module HTTP::BrowserDetect. According to the docs, this has a method which checks whether a user agent is a robot. Anything which isn't a robot is probably a browser, I guess.

answered Oct 08 '09 at 12:54

Matthew Wilson

3,861
21
14

I think that their both browsers, either Browsers such as Firefox, IE, etc and Robots, because they both browser the page. Just my opinion. – Fábio Antunes Mar 31 '10 at 12:08

Detect if visitor is a browser and not a crawler

6 Answers6