Using rvest html_nodes() to store li elements for each item scraped

Question

I am trying to download some data, for example I can use the following:

  "https://www.fotocasa.es/es/comprar/viviendas/barcelona-capital/sagrada-familia/l/19/" %>% 
  read_html() %>% 
  html_nodes(".re-CardFeatures-wrapper")

With the following strucutre:

List of 2
 $ :List of 2
  ..$ node:<externalptr> 
  ..$ doc :<externalptr> 
  ..- attr(*, "class")= chr "xml_node"
 $ :List of 2
  ..$ node:<externalptr> 
  ..$ doc :<externalptr> 
  ..- attr(*, "class")= chr "xml_node"
 - attr(*, "class")= chr "xml_nodeset"

This corresponds to two properties from the website.

I am interested in extracting the items "li" from the lists

"https://www.fotocasa.es/es/comprar/viviendas/barcelona-capital/sagrada-familia/l/19/" %>% 
  read_html() %>% 
  html_nodes(".re-CardFeatures-wrapper") %>% 
  html_nodes("li")

Which gives:

{xml_nodeset (10)}
 [1] <li class="re-CardFeatures-feature">2 habs.</li>\n
 [2] <li class="re-CardFeatures-feature">1 baño</li>\n
 [3] <li class="re-CardFeatures-feature">60 m²</li>\n
 [4] <li class="re-CardFeatures-feature">3ª Planta</li>\n
 [5] <li class="re-CardFeatures-feature">Balcón</li>
 [6] <li class="re-CardFeatures-feature">3 habs.</li>\n
 [7] <li class="re-CardFeatures-feature">1 baño</li>\n
 [8] <li class="re-CardFeatures-feature">75 m²</li>\n
 [9] <li class="re-CardFeatures-feature">5ª Planta</li>\n
[10] <li class="re-CardFeatures-feature">Ascensor</li>

However, now, it has broken the "2 list" strucutre that I originally had (one for each property).

My question is, how can I extract the html_nodes() for the two properties but store them as they correspond to each given property?

i.e. the list should "break" after "3 hab" since this is the first item of the second property.

See this question/answer: https://stackoverflow.com/questions/56673908/how-do-you-scrape-items-together-so-you-dont-lose-the-index/56675147#56675147 — Dave2e, Apr 11 '22 at 22:24

score 1 · Answer 1 · answered Apr 12 '22 at 11:04

To get the "2 list" we can use lapply as follows,

library(dplyr)
library(rvest)
house = "https://www.fotocasa.es/es/comprar/viviendas/barcelona-capital/sagrada-familia/l/19/" %>% 
  read_html() %>% 
  html_nodes(".re-CardFeatures-wrapper") 


lis = lapply(house, function(x) x %>% html_nodes("li"))

Now we have lis with info of each property stored in different element of a list.

Using rvest html_nodes() to store li elements for each item scraped

1 Answers1