1

enter image description here

I want to scrape the data from the website as shown in the screenshot above (data in red box) using Google Sheet. I tried to use IMPORTHTML and IMPORTXML but both are not working (output is empty).

This is my Google Sheet:

https://docs.google.com/spreadsheets/d/1ELo3iA4RmhUuFq7YEfsCVt2iuURFxc1Crdng7rLovTo/edit#gid=0

I'm not sure whether it is possible to scrape the data from this website (https://stockrow.com/AAPL) by using IMPORTHTML or IMPORTXML. Or is it possible to use Google Apps Script to achieve that?

Heretic Monkey
  • 11,687
  • 7
  • 53
  • 122
weizer
  • 1,009
  • 3
  • 16
  • 39
  • 1
    Something to consider: Perhaps the site doesn't want you to scrape their data and are actively preventing you from doing so. Many sites that provide financial data have terms of service that specifically forbid this type of scraping, since it costs them money to get the data... – Heretic Monkey Aug 20 '21 at 17:21
  • If the content is added dynamically (by using Javascript), it can't be imported by using Google Sheets built-in functions.. See here : https://webapps.stackexchange.com/questions/115664/how-to-know-if-google-sheets-importdata-importfeed-importhtml-or-importxml-fun , and I did'nt found any other way for this site. – Mike Steelson Aug 20 '21 at 17:45

1 Answers1

1

With these kind of sites, it is impossible for Sheets and Apps Script to scrape them due to the contents being dynamically generated as the comments already mentioned.

When someone is scraping with these kind of sites, most of them do use Selenium in Python. Basically, what it does is perform browser automation.

I know this might be useless information for you since Google App Engine isn't a tag, but for everyone else that would likely to encounter this issue and is quite familiar with Selenium in Python, this might be of help.

Running Selenium in Google App Engine can be a solution but if you don't want to invest time in studying and understanding Python together with Google App Engine, I recommend you steer clear from this. References that can give light to the issue are listed at the bottom.

Alternative:

  • The best way to overcome the issue without investing too much time is to find an alternative site that its content isn't generated by JavaScript and does provide you with the same data.
  • One way of checking the site if it is JS generated is to check the page source. If the one you are scraping is in the source code, then that text isn't JavaScript generated.

Reference:

Heretic Monkey
  • 11,687
  • 7
  • 53
  • 122
NightEye
  • 10,634
  • 2
  • 5
  • 24