Scrape webpage using Rvest

Question

I am trying to load some data from this web page. The part of info that I want to get it´s this specific part:

I inspected the page and I see this class&id:

So I tried like this:

url = url(paste0("http://www.aemet.es/es/eltiempo/prediccion/avisos?w=mna"))
aa2 = html_nodes(read_html(url),
                 'div#listado-avisos.contenedor-tabla')

aa3 = data.frame(texto = str_replace_all(html_text(aa2),"[\r\n\t]" , ""),
                 stringsAsFactors = FALSE)

And I get a dataframe with a row without any info... What I am doing wrong?

Thanks in advance.

Updated: possible answer thanks to QHarr:

library(httr)
library(rvest)
library(jsonlite)
url = "https://www.aemet.es/es/eltiempo/prediccion/avisos?w=mna"
download.file(url, destfile = "scrapedpage.html", quiet=TRUE)
date_value <- read_html("scrapedpage.html") %>% html_node('#fecha-seleccionada-origen') %>% html_attr('value')

url2 = paste0('https://www.aemet.es/es/api-eltiempo/resumen-avisos-geojson/PB/', date_value , '/D+1')
download.file(url2, destfile = "scrapedpage2.html", quiet=TRUE)

data <- httr::GET(url = "scrapedpage2.html", httr::add_headers(.headers=headers)) 

avisos = jsonlite::parse_json(read_html("scrapedpage2.html") %>%
  html_node('p') %>% html_text())

score 1 · Accepted Answer · answered Apr 06 '20 at 15:34

It is dynamically populated. If you don't mind some very minor differences you can issue two requests. One to the initial url to pick up a timestamp value; then issue an API request (as the page does) adding in the previously retrieved timestamp so as to get predictions for right period. Parse response to get at json holding the avisos

library(httr)
library(rvest)
library(jsonlite)

headers = c('Referer' = 'https://www.aemet.es/es/eltiempo/prediccion/avisos?w=mna')

date_value <- read_html('https://www.aemet.es/es/eltiempo/prediccion/avisos?w=mna') %>% html_node('#fecha-seleccionada-origen') %>% html_attr('value')

data <- httr::GET(url = paste0('https://www.aemet.es/es/api-eltiempo/resumen-avisos-geojson/PB/', date_value , '/D+1'), httr::add_headers(.headers=headers)) 

avisos <- jsonlite::parse_json(read_html(data$content) %>% html_node('p') %>% html_text())$objects$Avisos$geometries

Hi QHarr. I think that your system works pretty well, but I have an extra error that previously I didn´t have: "Error in open.connection(x, "rb") : Timeout was reached: [www.aemet.es] Connection timed out after 10001 milliseconds". I tried to avoid it using [this](https://stackoverflow.com/questions/36043172/package-rvest-for-web-scraping-https-site-with-proxy/38463559#38463559), but I get a mistake in the data part (I modified the post including the code) — GonzaloReig, Apr 13 '20 at 09:37

Scrape webpage using Rvest

1 Answers1