0

My server is overloaded and when I checked it was because Googlebot was constantly sending GET requests for a few AMP pages that are completely malformed. E.g.

GET /what-is-love/amp/wifflegif.com/pandawhale.com/s4.photobucket.com/thecomfortador.tumblr.com/www.tumblr.com/www.gifwave.com/thecomfortador.tumblr.com/teenskepchick.org/www.yourtango.com/reactiongif.org/thecomfortador.tumblr.com/bollypop.in/www.yourtango.com

I think my WordPress AMP plugin earlier might have formed these malformed URLs. Because the actual URL is supposed to be /what-is-love/amp/ but it adds additional domain names after it. These are nothing but domain names found in the article content.

Now these GET requests are hammering my MySQL server and it's overwhelmed. Because it takes a lot of time for the server to say: hey, google..this page doesn't exist.

I added this line as

Disallow: /what-is-love/amp/* to my robots.txt and restarted the server. It's reflecting, but Googlebot won't stop sending GET requests.

I have been banging my head against this for hours now. What am I doing wrong? What should I do? Please help.

LittleLebowski
  • 7,691
  • 13
  • 47
  • 72
  • I would recommend trying to block these URLs at the web server as a stop gap measure. That could be configured in .htaccess for Apache. [I would start here](https://stackoverflow.com/a/38068519/5194374) – 9072997 Jul 19 '20 at 18:23
  • 1
    I'm using nginx. How to write a rule that blocks a url pattern `/*/amp/*` but allows `/*/amp` or `/*/amp/`. I mean if there's anything after `amp/` then block that `GET` request. Can you guide please. – LittleLebowski Jul 19 '20 at 18:29
  • 1
    Try: `location ~ ./amp/. { return 404; }` - place it above other `location` blocks. – Richard Smith Jul 20 '20 at 06:11
  • Have you been successful in stopping the googlebot hammering? If so, what did you do? – Wilson Hauck Aug 09 '20 at 10:45
  • Little Lebowski was successful. See stackoverflow.com Question 62984495 for slick management in ngix configuration. – Wilson Hauck Aug 09 '20 at 10:54
  • 1
    @WilsonHauck Yes, that's correct. Here's the solution: https://stackoverflow.com/questions/62984495/regex-to-block-url-in-nginx/62984840?noredirect=1#comment111976371_62984840 – LittleLebowski Aug 10 '20 at 12:33

0 Answers0