0

I'm trying to scrape in any Amazon search to get products and their prices so I'm working with rvest library in R to do that.

For example, for this search:

Amazon Search

I want to extract all product names and their prices. I tried the follow:

library(rvest)
link='https://www.amazon.com.mx/s?k=gtx+1650+super&__mk_es_MX=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss_2'
simple=read_html(link)
simple %>% html_nodes("[class='a-size-base-plus a-color-base a-text-normal']") %>% html_text()

Using Chrome, class 'a-size-base-plus a-color-base a-text-normal' is where product name it's stored.

That code works fine and I get all the products names. So, I was trying to get theirs prices with this:

simple %>% html_nodes("[class='a-offscreen']") %>% html_text()

Using Chrome, class 'a-offscreen' is where price it's stored.

That code returns me every price in the search but if you have seen the search, not all products have price. So, that code returns me products with price and I can't match products with their prices.

Is there a way to make it possible? maybe it can be possible filter only those products that have class 'a-offset' and get their prices?

Thanks.

nyedidikeke
  • 6,899
  • 7
  • 44
  • 59
  • I don't know how helpful this will be but I faced a similar issue when developing a script to scrape lyrics, at [line 104](https://github.com/thedivtagguy/songscraper/blob/master/R/songscrape.R#L104) I have a separate variable to store each part of the information and then combine them using `cbind` after the whole scrape, to maintain order. To deal with missing prices, use `tryCatch`. This will try to look for the specific value, but if it is not found, fill that field with something like `NA` [More on trycatch](https://stackoverflow.com/questions/12193779/how-to-write-trycatch-in-r) – Aman Dec 31 '20 at 06:28
  • 1
    @Aman I tried what you said and I could write a decent code using xpath. I didn't know how xpath works but I did a research and with your tip and xpath utility I could fix my problem. Thanks a lot man :D – Jesús Valencia Dec 31 '20 at 19:05

1 Answers1

0

You need to scrape the nodes of items first and then with each node, you scrape the product name and the price. Similar to this question: RVEST package seems to collect data in random order

xwhitelight
  • 1,569
  • 1
  • 10
  • 19