1

I'm triying to scrape reviews from this webpage https://www.leroymerlin.es/fp/82142706/armario-serie-one-blanco-abatible-2-puertas-200x100x50cm. I'm running into some issues to get XPath, when I ran the code I found the output is always NULL. Code:

library(XML)
url <- "https://www.leroymerlin.es/fp/82142706/armario-serie-one-blanco-abatible-2-puertas-200x100x50cm"
source <- readLines(url, encoding = "UTF-8")
parsed_doc <- htmlParse(source, encoding = "UTF-8")
xpathSApply(parsed_doc, path = '//*[@id="reviewsContent"]/div[1]/div[2]/div[3]/h3', xmlValue)

I must be doing something wrong! I'm trying everything. Many thanks for your helps.

Steve2021
  • 11
  • 1

1 Answers1

0

The This webpage is dynamically created upon load with the data is stored in a secondary file, typical scraping and xpath methods will not work.

If you access your browser's developer's tools and goto the network tab. Reload the webpage and filter for the XHR files. Review each file and one should see a file named "reviews", this is the file where the reviews are stored in a JSON format. Right click the file and copy the link address. One can access this file directly:

library(jsonlite)
fromJSON("https://www.leroymerlin.es/bin/leroymerlin/reviews?product=82142706&page=1&sort=best&reviewsPerPage=5") 

Here is a good reference: How to Find The Link for JSON Data of a Certain Website

Dave2e
  • 22,192
  • 18
  • 42
  • 50