0

I am trying to download some data, for example I can use the following:

  "https://www.fotocasa.es/es/comprar/viviendas/barcelona-capital/sagrada-familia/l/19/" %>% 
  read_html() %>% 
  html_nodes(".re-CardFeatures-wrapper")

With the following strucutre:

List of 2
 $ :List of 2
  ..$ node:<externalptr> 
  ..$ doc :<externalptr> 
  ..- attr(*, "class")= chr "xml_node"
 $ :List of 2
  ..$ node:<externalptr> 
  ..$ doc :<externalptr> 
  ..- attr(*, "class")= chr "xml_node"
 - attr(*, "class")= chr "xml_nodeset"

This corresponds to two properties from the website.

I am interested in extracting the items "li" from the lists

"https://www.fotocasa.es/es/comprar/viviendas/barcelona-capital/sagrada-familia/l/19/" %>% 
  read_html() %>% 
  html_nodes(".re-CardFeatures-wrapper") %>% 
  html_nodes("li")

Which gives:

{xml_nodeset (10)}
 [1] <li class="re-CardFeatures-feature">2 habs.</li>\n
 [2] <li class="re-CardFeatures-feature">1 baño</li>\n
 [3] <li class="re-CardFeatures-feature">60 m²</li>\n
 [4] <li class="re-CardFeatures-feature">3ª Planta</li>\n
 [5] <li class="re-CardFeatures-feature">Balcón</li>
 [6] <li class="re-CardFeatures-feature">3 habs.</li>\n
 [7] <li class="re-CardFeatures-feature">1 baño</li>\n
 [8] <li class="re-CardFeatures-feature">75 m²</li>\n
 [9] <li class="re-CardFeatures-feature">5ª Planta</li>\n
[10] <li class="re-CardFeatures-feature">Ascensor</li>

However, now, it has broken the "2 list" strucutre that I originally had (one for each property).

My question is, how can I extract the html_nodes() for the two properties but store them as they correspond to each given property?

i.e. the list should "break" after "3 hab" since this is the first item of the second property.

Dave2e
  • 22,192
  • 18
  • 42
  • 50
user113156
  • 6,761
  • 5
  • 35
  • 81
  • See this question/answer: https://stackoverflow.com/questions/56673908/how-do-you-scrape-items-together-so-you-dont-lose-the-index/56675147#56675147 – Dave2e Apr 11 '22 at 22:24

1 Answers1

1

To get the "2 list" we can use lapply as follows,

library(dplyr)
library(rvest)
house = "https://www.fotocasa.es/es/comprar/viviendas/barcelona-capital/sagrada-familia/l/19/" %>% 
  read_html() %>% 
  html_nodes(".re-CardFeatures-wrapper") 


lis = lapply(house, function(x) x %>% html_nodes("li"))

Now we have lis with info of each property stored in different element of a list.

Nad Pat
  • 3,129
  • 3
  • 10
  • 20