-1

I'm having trouble pulling just the price for these sites into a Google sheet. Instead, I'm pulling multiple rows/currencies, etc. and I don't know how to fix it

1----> https://www.discountfilters.com/refrigerator-water-filters/models/ukf8001/ //main/div/div/div/div/div/div/div/div/div[1]/span/span/span 2----> https://www.discountfilters.com/refrigerator-water-filters/models/ukf8001/ //div[1]/form/div/div/div[1]/div/div/div[2]/div[1]

3----> https://filterbuy.com/air-filters/8x16x1/ //div[2]/div[1]/div[3]/span

I tried the xpaths above and it's giving me all the data instead of just the discounted price (row1) that I'm looking for.

Rubén
  • 34,714
  • 9
  • 70
  • 166

2 Answers2

1

try:

=INDEX(IMPORTXML(A1, "//div[@class='price mt-2 mt-md-0 mb-0 mb-md-3']"),,2)

enter image description here

player0
  • 124,011
  • 12
  • 67
  • 124
  • Thank you so much for jumping in player! Amazing!! Looks good except for 1 small issue....how do I only get the 1st price (1 row) instead of 2? – darwin liu Nov 16 '22 at 21:59
  • @darwinliu use: `=INDEX(IMPORTXML(A1, "//div[@class='price mt-2 mt-md-0 mb-0 mb-md-3']"),1,2)` – player0 Nov 16 '22 at 22:15
  • Thank you so much sir!! Last two questions for you...the Other websites https://filterbuy.com/brand/gibson-air-filters/ https://www.discountfilterstore.com/edr4rxd1-ukf8001-maytag-refrigerator-water-filter-replacement-tier1.html?pack=6 Doesn't seem to be working and shows up as NA. Would you be able to see if it's because they blocked all types of scraping? THank you soo much in advance!! – darwin liu Nov 17 '22 at 21:00
  • @darwinliu 2nd url from your question uses JavaScript and google sheets does not support JS imports. see: https://stackoverflow.com/a/74015233/5632629 – player0 Nov 17 '22 at 21:11
0

regarding issues on multiple websites you are trying to scrape.. ImportXML is good for basic tasks, but won't get you too far if you are serious in scraping:

  • If the target website data requires some cleanup post-processing, it's getting very complicated since you are now "programming with excel formulas", rather painful process compared to regular code writing in conventional programming languages

  • There is no proper launch & cache control so the function can be triggered occasionally and if the HTTP request fails, cells will be populated with ERR! values

  • The approach only works with most basic websites (no SPAs rendered in browsers can be scraped this way, any basic web scraping protection or connectivity issue breaks the process, no control over HTTP request geo location, or number of retries)

When ImportXML() fails, the second approach to web scraping in Google Sheets is usually to write some custom Google Apps Script. This approach is much more flexible, just write Javascript code and deploy it as Google Sheets addon, but it takes a lot of time, and is not too easy to debug and iterate over - definitely not low code.

And the third approach is to use proper tools (automation framework + scraping engine) and use Google Sheets just for storage purposes: https://youtu.be/uBC752CWTew

Anthony S
  • 124
  • 2