Struggling to understand Google Sheets IMPORTXML and Xpath

Question

I'm really looking for an education here on how to understand this. I've posted before and someone helped my specific scenario, but I'm hoping someone can help me with the "why" along with an answer.

I'm trying to import stock information off of www.barchart.com. This time, I'm trying to pull analyst ratings from this page: https://www.barchart.com/stocks/quotes/W/analyst-ratings

Inserting screenshot of what I'm trying to gather...www.barchart.com analyst ratings Within this screenshot, I'd like to build two queries: (1) that grabs the rating (hold, buy, etc.) and (2) that grabs the number of analysts that contribute to that average rating.

I've tried this:

=IMPORTXML("https://www.barchart.com/stocks/quotes/W/analyst-ratings","//div[@class='block__colored-header rating']")

That gives me a "#N/A Imported Content is Empty" error in Google Sheets.

I've tried this:

=IMPORTXML("https://www.barchart.com/stocks/quotes/W/analyst-ratings","/html/body/div[2]/div/div[2]/div[2]/div/div[2]/div/div/div/div[3]/div[2]/div[1]/div[1]/div/div/div[4]/div[1]/div[2]")

That also gives me a "#N/A Imported Content is Empty" error in Google Sheets.

Other wonderful users here on stackoverflow swoop in and magically give me the answer (other examples were other websites), but then I can't seem to reverse engineer the knowledge to apply it to other scenarios that I want to accomplish.

So, I'd love to get help with this problem. But I would really love it if someone could help me UNDERSTAND why it is the answer and maybe point me to some resources I can use to educate myself and become self-sufficient.

Thanks in advance!!

You are probably being blocked by the site as a bot. To handle it, you'll need something more robust like the requests library in Python or maybe even Selenium. — Jack Fleeting, Jul 05 '20 at 20:42

score 1 · Answer 1 · edited Mar 10 '21 at 18:17

The page that you are trying to scrape has blocked that. By using //*/text() it returns

--+---------------------------------------------------------------------------+
  |                                   A                                       |
--+---------------------------------------------------------------------------+
1 | https://www.barchart.com/ondemand                                         |
--+---------------------------------------------------------------------------+
2 | Interested in API access? Barchart offers data through                    | 
--+---------------------------------------------------------------------------+
3 | Barchart OnDemand                                                         |
--+---------------------------------------------------------------------------+
4 | ". Barchart OnDemand provides both premium market data delivery through a |
  | collection of web services APIs (JSON, XML and CSV formats) and a limited |
  | free service."                                                            |
--+---------------------------------------------------------------------------+

IMPORTXML

can only get data from the source code
can only grab nodes that are from HTML files that are well formed according to the XML rules or at least the whole path is well formed according those rules.
It uses Google servers and an user-agent that could be easily identified by web pages to block it.

Tip

Don't use Chrome developers tools to grab the xPath from the Elements tab because it returns the xPath of the DOM after the web page was parsed (the original DOM might was modified dynamically by JavaScript, by the Chrome web page parsing engine and/or installed and enabled Chrome extensions.

score 0 · Answer 2 · answered Jul 07 '20 at 01:24

0

Once you're done understanding what's going wrong, you can fetch the data with IMPORTFROMWEB addon with JS activated.

answered Jul 07 '20 at 01:24

E.Wiest

5,425
2
7
12

It's bad enough that Google knows as much about me. I'm not going to hand over more data to a relatively unknown company. From https://nodatanobusiness.com/importfromweb/: "We use proxies and emulate browsers to maximize compatibility with complex websites." – sbhouse Jul 11 '20 at 17:31
Regarding this aspect of personnal privacy, then it's probably better to use Python, R, Ruby,.. to accomplish this job. If you really want to keep a Google solution, do it through a dummy account. – E.Wiest Jul 12 '20 at 14:41

Struggling to understand Google Sheets IMPORTXML and Xpath

2 Answers2