2

is there a way to reliably identify a facebook bot by IP?

My site is getting hammered by bots claiming to be facebook, but how do I know for sure? I'm looking for some kind of official procedure to validate a facebook bot similar to what Google recommends for their bots here.

Can I perhaps parse OrgName in whois and trust that? Or can that be fake?

krukid
  • 4,285
  • 5
  • 31
  • 30
  • 1
    You could check the IP addresses, if they are from the officially published ranges: http://stackoverflow.com/questions/8859013/whats-the-ip-address-range-of-facebooks-open-graph-crawler – CBroe Aug 09 '12 at 11:11

2 Answers2

3

I'm answering this for the sake of keyword indexing in the internets.

Indeed, it looks like the best way to identify a Facebook bot (Facebook Scraper) is by matching IP to the range of declared official Facebook IP ranges that can be acquired by running

whois -h whois.radb.net '!gAS32934'
krukid
  • 4,285
  • 5
  • 31
  • 30
  • 1
    Updated query from [Facebook's site](https://developers.facebook.com/docs/sharing/best-practices#crawl): whois -h whois.radb.net -- '-i origin AS32934' | grep ^route >>Returns similar info, but with two differences: 1) Includes IPv6 addresses as well. 2) The output format is different. No messy "A1063" and "C" before and after the list. Each address in a separate row, including the type (route or route6) – oferei Nov 27 '14 at 10:31
  • whois -h whois.radb.net -- '-i origin AS32934' | grep ^route – oferei Nov 27 '14 at 10:32
-2

Why don't you just check the user-agent, such as

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) 

instead of IP?

Zoltan Toth
  • 46,981
  • 12
  • 120
  • 134
Chris Lim
  • 424
  • 4
  • 9
  • 3
    Because _every_ bot can send whatever it likes for a user agent – and the topic of this question was to discern possible fake bots _pretending_ to be Facebook’s scraper from “the real thing” … and that’s exactly the reason why Facebook provides access to the list of the IPs they’re using … – CBroe Aug 15 '12 at 13:46
  • `user-agent` can be fake as whatever you want – Tien Hoang Oct 22 '16 at 04:37