1

Trying to scrape some classes from a website, to compile a spreadsheet report. I've only used Importxml briefly for common sites, which i have made work. However, how can we address cases where we get the error - "Could not fetch url"?

I do know that the site uses Javascript, which i've heard can have problems loading into Sheets. I have also seen that there might be some extra script additions that can be made to get JS sites to load though, but i don't know enough about JS to properly understand it...

I have tried experimenting with Apps Scripts to try and get my head around this (not very familiar with JS), based off this other article found on here: Google Sheet use Importxml error could not fetch url

But i could not replicate the solution that was mentioned. Is it possible, since this article was written, the website has undergone extra security measures so it cannot be scraped? Are there work-arounds for this within Google Sheets these days?

Short story, is if i used this custom script:

function sampleFormula() {
  const url = "https://tarkov-market.com/item/Pack_of_sugar";
  const html = UrlFetchApp.fetch(url).getContentText();
  return html.match(/price:(.+?)<\/title>/)[1].trim();
}

then simply place the sampleFormula() in a cell, it should display "55,500₽" (or something), but i just get an error instead.

Error Exception: Request failed for https://tarkov-market.com returned code 403. Truncated server response: 403 Forbidden

403 Forbidden


nginx (use muteHttpExceptions option to examine full response) (line 3).

Have i missed something that makes my version of the same problem break?

Rubén
  • 34,714
  • 9
  • 70
  • 166
Benny_C
  • 11
  • 2

1 Answers1

1

403 Forbidden means that the server understood your request, but won't allow it. It sounds like https://tarkov-market.com may be protected against web scrapping. I tried to import one of the lists from the site with the =IMPORTHTML function and got this error message:

Error
Could not fetch URL: https://tarkov-market.com/
Rubén
  • 34,714
  • 9
  • 70
  • 166
  • 1
    They might be protectect against Google Apps Script i.e. using the Google known IP addresses or the user agent. Related https://stackoverflow.com/q/73883849/1595451 – Rubén Oct 31 '22 at 00:29