1

Over the last several weeks we have seen a HUGE increase in BW usage but not in page views. I finally researched the access_logs and found a TON of lines like this in there:

    173.252.103.0 - - [09/Mar/2013:17:48:19 -0500] "GET /files/imagecache/article_max_width/news/2013-03-04/H_h90_110.jpg HTTP/1.1" 403 344 "-" "facebookexternalhit/1.1 (+h
ttp://www.facebook.com/externalhit_uatext.php)"
173.252.103.2 - - [09/Mar/2013:17:48:19 -0500] "GET /files/imagecache/article_max_width/news/2013-03-04/NT_audyssey_wireless_mediaa.jpg HTTP/1.1" 403 362 "-" "facebooke
xternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
173.252.103.3 - - [09/Mar/2013:17:48:19 -0500] "GET /files/imagecache/article_max_width/news/2013-03-05/TR_board.jpg HTTP/1.1" 403 343 "-" "facebookexternalhit/1.1 (+ht
tp://www.facebook.com/externalhit_uatext.php)"
173.252.103.6 - - [09/Mar/2013:17:48:19 -0500] "GET /files/imagecache/article_max_width/news/2013-03-03/LG%20Optimus%20G.jpg HTTP/1.1" 403 347 "-" "facebookexternalhit/
1.1 (+http://www.facebook.com/externalhit_uatext.php)"
173.252.103.4 - - [09/Mar/2013:17:48:19 -0500] "GET /files/imagecache/article_max_width/review/2013-03-04/5-lit.jpg HTTP/1.1" 403 342 "-" "facebookexternalhit/1.1 (+htt
p://www.facebook.com/externalhit_uatext.php)"
173.252.103.2 - - [09/Mar/2013:17:48:19 -0500] "GET /files/imagecache/article_max_width/review/2013-03-02/IMG_9546.JPG HTTP/1.1" 403 345 "-" "facebookexternalhit/1.1 (+
http://www.facebook.com/externalhit_uatext.php)"
173.252.103.1 - - [09/Mar/2013:17:48:19 -0500] "GET /files/imagecache/article_max_width/news/2013-03-05/deal0305.png HTTP/1.1" 403 343 "-" "facebookexternalhit/1.1 (+ht
tp://www.facebook.com/externalhit_uatext.php)"
173.252.103.4 - - [09/Mar/2013:17:48:19 -0500] "GET /files/imagecache/article_max_width/news/2013-03-05/H_Geforce.jpg HTTP/1.1" 403 344 "-" "facebookexternalhit/1.1 (+h
ttp://www.facebook.com/externalhit_uatext.php)"

There is literally PAGES of them over days and days. I can't figure out what is going on here and I really don't want to use Apache to block the "facebookexternalhit" robot.

Any ideas?

Ryan Shrout
  • 19
  • 1
  • 3
  • 1
    Have you taken a look at the url http://www.facebook.com/externalhit_uatext.php? That at least explains what the bot is doing. Based on the log entries you posted, all the requests resulted in a 403 (forbidden). Do your users have to be logged in to see those images? Perhaps if the images were publicly available, the bot wouldn't hammer so hard to get at the shared link's images. – Jonathan MacDonald Mar 09 '13 at 23:09
  • So the 403 errors are because I started to block the user agent of "facebookexternalhit/1.1" in apache to make sure that was the problem. Usually it returns 206 and sends the file, hence our problem. – Ryan Shrout Mar 10 '13 at 13:50
  • Hey Ryan, ok, in that case the next thing I'd try is Apache's mod_expires (http://httpd.apache.org/docs/2.2/mod/mod_expires.html) which allows you to set the Expires and Cache-Control HTTP headers. For example, if an image is delivered and it 'expires' a week from now, the idea is that the client should respect that and not ask again until then. I've witnessed huge benefits with mod_expires but that was with primarily Internet Explorer asking for the same images again and again and again. With the headers set, IE stopped asking. Hopefully the facebook bot will do the same. – Jonathan MacDonald Mar 11 '13 at 06:39
  • 1
    Just wanted to followup here and point out the bug is being tracked "several times" by the Facebook developer system: http://developers.facebook.com/bugs/385275384858817 – Ryan Shrout Mar 12 '13 at 06:16

1 Answers1

1

I had same problems on my server. I solved it, by removing the og:url metatag and by changing the og:image metatag as I mentioned here: https://stackoverflow.com/a/24107181/3248313

Community
  • 1
  • 1
Nico
  • 39
  • 3