2

I've created a google spreadsheet that periodically retrieves data from a certain webpage and it worked perfectly for about a month. However, after the day before yesterday (19/08) it is suddenly giving the "Could not fetch URL" error for both the importxml() and importhtml(), even though the website itself still loads without issues when using a browser. In the mean time, nothing has been changed on the spreadsheet, apart from that it's been distributed to other people.

The spreadsheet (Naturally, you're free to make a copy of it; it concerns cells H1 and A2)

Solutions I've tried:
- Google script's Urlfetchapp(); it seems to be able to fetch the webpage without issues (but without the ease of formatting that importhtml has built-in)
- Included trim() within the importhtml to remove any potential spaces within the url
- Attempted multiple other subdomains of sfstat.info (such as sfstat.info/na/pantheons/); all webpages of sfstat.info seem to give the same error.
- Attempted to fetch other URL's such as Google etc. These are fetched without issues
- Excel its equivalent of importhtml. This also seems to work without problems.
- While the spreadsheet technically adds &minute(now()) to the url, removing this does not resolve the "Could not fetch URL" issue either.
- Downloaded & hosted the webpage on Google drive and attempted to fetch it's data using importhtml & importxml; this also did not result in issues. It might thus be that the fetch is seen as a DoS attempt due to the multitude of requests.

Thank you in advance.

arphelior
  • 31
  • 1
  • 5

2 Answers2

1

It was indeed not the spreadsheet that caused this issue; apparently Google sent an insane amount of requests to the domain sfstat.info (over 10k in 6 hours), hence resulting the IP being blocked.

arphelior
  • 31
  • 1
  • 5
1

"requests to sfstat.info (over 10k in 6 hours), hence resulting in Cloudflare blocking the IP."

What error message was it getting? We don't block Google's IPs by default (they are in our macro list).

damoncloudflare
  • 2,079
  • 13
  • 9
  • According to the website developer, a 403 error was the response to all Google IPs. Furthermore, when I was the sole person using the spreadsheet, there were no issues and all data was retrieved as it should have. The 403 error appeared when a multitude of copies of the script were also fetching data. It might've been that I misinterpreted his response though. He solely mentioned "Cloudflare responded to all Google IPs with a 403 error, as the website got 10k requests in 6 hours". – arphelior Aug 25 '15 at 06:46
  • 1
    You might want to open a support ticket with a little detail. The only thing that comes to mind is that you might be blocking a specific country (countries) in the Firewall settings, which would create a 403/challenge page. We don't do anything that blocks Google and other search engines by default. – damoncloudflare Aug 26 '15 at 18:32
  • Unfortunately, it's not my own website; I'm trying to retrieve data from someone else's. Regarding the 403 error, it could be that the website owner has manually blocked the IP range (I've updated the answer accordingly). However, the matter was luckily already solved with the website developer and the original intention I had for the spreadsheet linked above may be included inside the website itself (making the spreadsheet unnecessary). Thank you for the information though Damon, I highly appreciate it :) – arphelior Aug 26 '15 at 22:05