2

How would I server side detect if the user of the webpage is a browser?

The reason for detecting this is, that I'm storing statistics about visited pages and I don't want to store any statistics when the user is a crawler. So I'm not trying to detect which browser vendor. I only want a boolean answer, is the user a browser.

I assume the answer is connected with the user-agent header, but the numerous possibly values is too overwhelming for me to figure out. How can I detect it? A 90% solution where I only detect the most Top5 popular browser would be good enough.

I'm using C# ASP.Net, but I guess most solutions in other languages and frameworks could be translated.

Dave Sumter
  • 2,926
  • 1
  • 21
  • 29
Karsten
  • 8,015
  • 8
  • 48
  • 83
  • 4
    As far as I understand, a user is a human, which is using a browser. So a user is never a browser. Do you mean how you can detect which browser users are using? – simon Oct 08 '09 at 12:50
  • 3
    "I am not a browser, I'm a free man!" user->get (coat); user->wear (coat); user->exit (room); – Skizz Oct 08 '09 at 12:56
  • 2
    See "Detecting honest web crawlers" at http://stackoverflow.com/questions/544450/detecting-honest-web-crawlers – Arjan Oct 08 '09 at 13:08

6 Answers6

6

This code is by no means exhaustive, but gives you a basic founding in PHP. I can't guarantee a few won't slip through the net, but this should catch most browsers and ignore most bots.

<?php
// Regular expression to match common browsers
$browserlist = '/(opera|aol|msie|firefox|chrome|konqueror|safari|netscape|navigator|mosaic|lynx|amaya|omniweb|avant|camino|flock|seamonkey|mozilla|gecko)+/i';

// Test for browsers
if (preg_match($browserlist, $_SERVER['HTTP_USER_AGENT'])) {
    // ...is a browser
} else {
    // ...is not a browser
}
?>
Rowan
  • 5,597
  • 2
  • 22
  • 32
  • should there not be an 'i' at the end of the preg match to make it case insensitive? – zaf Apr 21 '11 at 08:54
  • Very good point! I missed that as I normally `strtolower` the user-agent first.. just fixing – Rowan Apr 25 '11 at 10:59
  • 3
    I don't think this will work, all user agents starts with Mozilla (long and sad story), including many crawlers. – Bill Yang Feb 28 '12 at 17:09
  • You'd be correct if we were trying to single out individual browsers, but we're filtering bots. The small amount of research I did for this answer showed that little-to-no bots have "mozilla" in their user-agent string. If you have a list of examples, I'll be more than happy to change the answer – Rowan Mar 06 '12 at 08:16
  • If you head to http://www.user-agents.org/ there are a number of examples of bots having Mozilla (here's one... http://www.user-agents.org/) – BlueFish Feb 13 '13 at 04:17
  • I am really late to the party but ran into this, is there a way nowaydays, four years later, where we can do this without sniffing the user agent? – Henk Jansen Nov 15 '13 at 12:24
1

this will give you 99% accuracy.

Regex.IsMatch(Request.UserAgent, @"bot|crawler|baiduspider|80legs|ia_archiver|voyager|curl|wget|yahoo! slurp|mediapartners-google", RegexOptions.IgnoreCase);

I've listed more solutions in my answer here:

Detecting honest web crawlers

Community
  • 1
  • 1
Dave Sumter
  • 2,926
  • 1
  • 21
  • 29
1

If you're using PHP, try $_SERVER['HTTP_USER_AGENT'] or the get_browser() function.

Here's a class that's available for download that makes this process incredibly easy. Example usage:

$browser = new Browser();
if( $browser->getBrowser() == Browser::BROWSER_FIREFOX && $browser->getVersion() >= 2 ) {
 echo 'You have FireFox version 2 or greater';
}
Donut
  • 110,061
  • 20
  • 134
  • 146
  • Thats a really good class. Easy to integrate and use, and good features. Really useful, I'm using it to detect incoming Internet Crawlers Robots, and gets the job done pretty easily. – Fábio Antunes Mar 31 '10 at 12:05
1

I am not sure why you are doing this.

However, you do not want to even pretend you can tell between a human versus a robot based on the User-agent header. You might be able to get some success analyzing the pattern and timing of requests.

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
0

If you need to decide whether to return (say) raw text for a script versus HTML for a browser, perhaps you should inspect the Accept-Encoding request header?

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
0

Perl has a module HTTP::BrowserDetect. According to the docs, this has a method which checks whether a user agent is a robot. Anything which isn't a robot is probably a browser, I guess.

Matthew Wilson
  • 3,861
  • 21
  • 14
  • I think that their both browsers, either Browsers such as Firefox, IE, etc and Robots, because they both browser the page. Just my opinion. – Fábio Antunes Mar 31 '10 at 12:08