1

I want to get some listing data from YellowPages.com using Google Apps Script. Using UrlFetchApp.fetch(url) in GAS (as shown below), the server throws a 500 error. However, I can successfully use IMPORTXML() inside a Google Sheet on the same URL and it works fine.

What explains this difference in behavior? And what can I do differently in Google Apps Script to achieve the same, desired result I'm getting from IMPORTXML()?

Google Sheets

=IMPORTXML("https://www.yellowpages.com/al/accounting-services", "//div[@class='v-card']//a/@href")

The behavior is as expected. The result is an array of links.

Google Apps Script

Code.gs
const ENDPOINT = 'https://www.yellowpages.com/al/accounting-services';
const main = () => {
  const response = UrlFetchApp.fetch( ENDPOINT, );
  const responseContentText = response.getContentText();
  Logger.log('(line 5) responseContentText\n%s', responseContentText,);
  return responseContentText;
}

The behavior not as expected. The result is a 500 error.

Error message:
Exception: Request failed for https://www.yellowpages.com returned code 500. Truncated server response: <html>
 <head><title>500 Internal Server Error</title></head>
 <body>
 <center><h1>500 Internal Server Error</h1></center>
 <hr><center>openresty</... (use muteHttpExceptions option to examine full response) (line 16, file "Code")

Why is the UrlFetchApp.fetch(url) method throwing an error and what can be done to make it behave like IMPORTXML()?

Let Me Tink About It
  • 15,156
  • 21
  • 98
  • 207
  • I'd think this website could be blocking requests made from `UrlFetchApp` somehow (I tried making the request using several other tools and it was working correctly everywhere else). Depending on your exact situation, you might want to publish your script as a web app and use [fetch](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch). – Iamblichus Jun 04 '20 at 12:02
  • @Iamblichus: But what could the server possibly be seeing that let's it identify `UrlFetchApp` as the source of the request? They are all unauthenticated `GET` requests, right? An IP address of a Google Server is the only thing I can think of. But then, assuming Google cycles the use of its servers, it should only get blocked for a subset of all the requests. Correct? Thoughts? – Let Me Tink About It Jun 04 '20 at 12:56
  • If you are still looking for the solution of this question, for example, is this 1st script useful? https://stackoverflow.com/a/63024816/7108653 And also, about the reason of your issue, how about the experiment at the section of "Added:"? – Tanaike Sep 12 '20 at 02:28

0 Answers0