12

In order to test the Open Graph API on our preview environment, we need to poke a hole in our firewall to allow Facebook to scrape our object pages. What IP ranges should we allow?

Michael Petrotta
  • 59,888
  • 27
  • 145
  • 179
the1plummie
  • 750
  • 2
  • 10
  • 21

6 Answers6

26

EDIT

Facebook has been showing some love and is now making the IP block public for anyone to have

http://developers.facebook.com/docs/ApplicationSecurity/#facebook_scraper https://developers.facebook.com/docs/sharing/best-practices#crawl

Facebook Scraper

A number of Platform services such as Social Plugins and the Open Graph require our systems to be able to reach your Web Pages. We recognize that there are situations where you might not want these pages on the public Internet, during testing or for other security reasons.

To facilitate this, you should make exceptions in your security systems to allow Facebook to scrape these pages by adding the following IP ranges, accurate as of April 2012.

31.13.24.0/21
31.13.64.0/18
66.220.144.0/20
69.63.176.0/20
69.171.224.0/19
74.119.76.0/22
103.4.96.0/22
173.252.64.0/18
204.15.20.0/22

Instead of IP, you can also use the user agent for your firewall.

http://developers.facebook.com/docs/reference/plugins/like/

When does Facebook scrape my page?

Facebook needs to scrape your page to know how to display it around the site.

Facebook scrapes your page every 24 hours to ensure the properties are up to date. The page is also scraped when an admin for the Open Graph page clicks the Like button and when the URL is entered into the Facebook URL Linter. Facebook observes cache headers on your URLs - it will look at "Expires" and "Cache-Control" in order of preference. However, even if you specify a longer time, Facebook will scrape your page every 24 hours.

The user agent of the scraper is: "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"

Stalinko
  • 3,319
  • 28
  • 31
DMCS
  • 31,720
  • 14
  • 71
  • 104
  • Facebook has just released some information concerning this. I've added it to my reponse. – DMCS May 07 '12 at 18:21
  • +1! FYI, the "IP range" approach is superior; the "user agent" approach won't catch all Facebook scrapers. It may technically catch all "Open Graph Crawlers", but Facebook has more scrapers that use undistinguishable user agents. (Source: Our logs) – rinogo Oct 26 '18 at 20:59
7

whois -h whois.radb.net -- '-i origin AS32934' | grep ^route to see all ranges.

Stillmatic1985
  • 1,792
  • 6
  • 20
  • 39
  • 1
    Here's a little more info on [how this magic works](https://ma.ttias.be/whois-at-the-cli-get-all-ip-ranges-from-an-as-number/). Also, note that this approach is actually [recommended by Facebook](https://developers.facebook.com/docs/sharing/webmasters/crawler). – rinogo Oct 26 '18 at 21:08
2
  • 66.220.144.0/20

  • 66.220.144.0/21

  • 66.220.152.0/21
  • 66.220.159.0/24
  • 69.63.176.0/20

  • 69.63.176.0/21

  • 69.63.176.0/24

  • 69.63.184.0/21

  • 69.171.224.0/19

  • 69.171.224.0/20
  • 69.171.239.0/24
  • 69.171.240.0/20
  • 69.171.255.0/24
  • 74.119.76.0/22
  • 103.4.96.0/22
  • 173.252.64.0/18
  • 173.252.64.0/19
  • 173.252.70.0/24
  • 173.252.96.0/19
  • 204.15.20.0/22

  • 31.13.24.0/21

  • 31.13.64.0/18
  • 31.13.64.0/19
  • 31.13.64.0/24
  • 31.13.65.0/24
  • 31.13.66.0/24
  • 31.13.67.0/24
  • 31.13.68.0/24
  • 31.13.69.0/24
  • 31.13.70.0/24
  • 31.13.71.0/24
  • 31.13.72.0/24
  • 31.13.73.0/24
  • 31.13.74.0/24
  • 31.13.75.0/24
  • 31.13.76.0/24
  • 31.13.77.0/24
  • 31.13.96.0/19
Tadeck
  • 132,510
  • 28
  • 152
  • 198
  • 3
    Could you specify where you got this data? – AndrewF Apr 27 '12 at 14:16
  • These are the routes that Facebook advertises out to the internet and was accurate the date of the post. FB isn't keen on assigning forever static addressees to their external VIPs like this one and customers will generally just have to allow all of their address space to ensure that if/when they add a new vip from their pool of ipv4 address, they can connect to it. This of course means if facebook starts advertising new ipv4 space that this list will be out of date. – Kyle O'Malley May 06 '12 at 00:36
1

Facebook now publishes their IP range.

As of April 2012, it is:

31.13.24.0/21
31.13.64.0/18
66.220.144.0/20
69.63.176.0/20
69.171.224.0/19
74.119.76.0/22
103.4.96.0/22
173.252.64.0/18
204.15.20.0/22
bkaid
  • 51,465
  • 22
  • 112
  • 128
1

New information is listed on the following URL & yes, they do have this info public.

Run this command to get a current list of IP addresses the crawler uses.

whois -h whois.radb.net -- '-i origin AS32934' | grep ^route

Such as

# For example only - over 100 in total
31.13.24.0/21 
66.220.144.0/20    
2401:db00::/32  
2620:0:1c00::/40  
2a03:2880::/32 

So yeah, the ones mentioned by DMCS, stand correct. Just wanted to verify & found this info.

Thanks

Community
  • 1
  • 1
tushonline
  • 270
  • 3
  • 16
-1

Facebook does not publish their crawler source address range officially, but you can look at the list of all their IP ranges in the publicly available BGP routing table:

We're currently using this list:

  • 69.171.224.0/19
  • 74.119.76.0/22
  • 204.15.20.0/22
  • 66.220.144.0/20
  • 69.63.176.0/20
  • 173.252.64.0/18