1

I am very new to web-scraping and was introduced to it just today after trying to figure out a formula on a spreadsheet.

I would like to retrieve the Sector information onto Yahoo Finance into Google Sheets. I would also like to the data to update when there is a change to cell B7. Link: https://finance.yahoo.com/quote/MIDD/profile?p=MIDD

I came up with the following, but get a #N/A error: =importxml("https://finance.yahoo.com/quote/",B7,"/profile?p=",B7, "//*[@class='Fw(600) [@data-reactid='21']")

Please let me know what I might be doing wrong. Thank you in advance.

Monica Wall
  • 11
  • 1
  • 2
  • It says: Error Wrong number of arguments to IMPORTXML. Expected 2 arguments, but got 5 arguments. – Monica Wall Sep 22 '20 at 18:03
  • I saw another thread about using div class or '@id' instead of '@class', however I'm not familiar with how to apply it to this objective. – Monica Wall Sep 22 '20 at 18:05
  • Thanks. The function info in Google Sheet used a ",". I updated to & and received the following: Error Wrong number of arguments to IMPORTXML. Expected 2 arguments, but got 1 arguments. – Monica Wall Sep 22 '20 at 18:08

1 Answers1

1

Solution

This is the right syntax to use IMPORTXML formula:

=IMPORTXML("URL", "XPATH_QUERY")

In your case this will translate to:

=importxml("https://finance.yahoo.com/quote/"&B7&"/profile?p="&B7,"//*[@class='Fw(600)'] [@data-reactid='21']")

Which will return an empty result.

Considerations

Keep in mind that many sites go to great lengths to actively prevent scraping. Allowing you to scrape their data entirely, undermines their business model. Since they might make profit from adds for example.

Check in the page you want to scrape if the tags you are watching for correspond to the data you wanted to get in the first place. I believe in this case it's just a matter of changing the tags values to the proper ones.

Alessandro
  • 2,848
  • 1
  • 8
  • 16