I'm doing some webscraping.
I need to get the actual_price, and put the old_price in another column.
The problem is that not all products have an old_price element, because they are new.
And as they don't have the same length, i cannot join them in a data.frame.
In the case the product has no old_price, i would like to have NA in the cell.
Is there a way to do it with Rvest?
Expected result:
Product PriceNew PriceOld
A 2300.00 NA
B 9.90 49.00
C 1299.00 2499.00
D 829.00 1499.00
![enter image description here][1]
As you see, here is an example. One product has actual and old price, the other one not.
I've been doing this:
Celulares_Telefonia_Precio_actual <- html(page_source[[1]]) %>%
html_nodes(".product-itm-price-new") %>%
html_text()
Celulares_Telefonia_Precio_antiguo <- html(page_source[[1]]) %>%
html_nodes(".product-itm-price-old") %>%
html_text()
All products have a price, but not all have an old price. So for those products with only new price, i would like to have NA in the Old_Price column.
length(Celulares_Telefonia_Precio_actual) gives 120
length(Celulares_Telefonia_Precio_antiguo) gives 114
EDIT 1:
Code to reproduce the situation. It is for the Celulares section:
Run Gist to get my data, please:
library(devtools)
source_gist("https://gist.github.com/OmarGonD/b70b712327d7e479f2c7")
EDIT 2:
I've tried looking at the overall container (Product Brand, Product Name, New Price, Old Price). With SelectorGadget i see that the overall container is: "#catalog-items"
(correct me if i'm wrong).
So i use:
Celulares_Telefonia_Catalogo <- html(page_source[[1]]) %>%
html_nodes("#catalog-items")
But i've no idea how to extract the new and old prices as the question says.
Any hint is welcome.