I am developing a function to scrape some temperature data from a website. It works but only for the first day of the month.
This gets the data for month 8 and year 2015. However it only scrapes the first table.
How can I use rvest
to collect all the tables in that month?
https://www.timeanddate.com/weather/spain/madrid/historic?month=8&year=2015
library(rvest)
library(dplyr)
library(purrr)
Temps <- function(month, year){
url <- paste("https://www.timeanddate.com/weather/spain/madrid/historic?month=", month, "&year=",year, sep = "")
temps_obtained <- url %>%
read_html() %>%
html_table(fill = TRUE) %>%
.[[2]] %>%
setNames(.[1,]) %>%
as_tibble(., .name_repair = "universal") %>%
dplyr::slice(., -1) %>%
dplyr::slice(., -n())
return(temps_obtained)
}
map2(.x = 8, .y = 2015, ~Temps(.x, .y))
Edit: I just found this solution (for Python):
Scraping table from website [timeanddate.com]
EDIT: This is what I am currently working with which returns no data:
year = 2019
month = 11
day = 3
month = stringr::str_pad(month, width = 2, pad = 0)
day = stringr::str_pad(day, width = 2, pad = 0)
url <- paste("https://www.timeanddate.com/weather/spain/madrid/historic?hd=", year, month, day, sep = "")
temps_obtained <- url %>%
html_session() %>%
read_html() %>%
html_table(fill = TRUE)
EDIT:
I think this solves the problem...
year = 2019
month = 11
day = 3
month = stringr::str_pad(month, width = 2, pad = 0)
day = stringr::str_pad(day, width = 2, pad = 0)
url <- paste("https://www.timeanddate.com/weather/spain/madrid/historic?hd=", year, month, day, sep = "")
temps_obtained <- url %>%
html_session() %>%
read_html() %>%
html_table(fill = TRUE) %>%
.[[2]] %>%
setNames(.[1,]) %>%
as_tibble(., .name_repair = "universal") %>%
dplyr::slice(., -1) %>%
dplyr::slice(., -n())
Which returns:
# A tibble: 27 x 9
Time ...2 Temp Weather Wind ...6 Humidity Barometer Visibility
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 7:00 amSun, Nov 3 "" 55 °F Passing clouds. 16 mph ↑ 88% "29.62 \"Hg" N/A
2 7:30 am "" 55 °F Passing clouds. 21 mph ↑ 88% "29.62 \"Hg" N/A
3 8:00 am "" 55 °F Broken clouds. 21 mph ↑ 88% "29.62 \"Hg" N/A
4 8:30 am "" 55 °F Broken clouds. 18 mph ↑ 88% "29.65 \"Hg" N/A
5 9:00 am "" 55 °F Drizzle. Broken clouds. 16 mph ↑ 94% "29.68 \"Hg" N/A
6 9:30 am "" 57 °F Broken clouds. 21 mph ↑ 82% "29.71 \"Hg" N/A
7 10:00 am "" 57 °F Broken clouds. 26 mph ↑ 63% "29.71 \"Hg" N/A
8 10:30 am "" 57 °F Scattered clouds. 29 mph ↑ 55% "29.74 \"Hg" N/A
9 11:00 am "" 57 °F Scattered clouds. 17 mph ↑ 55% "29.77 \"Hg" N/A
10 11:30 am "" 59 °F Scattered clouds. 20 mph ↑ 51% "29.77 \"Hg" N/A
Changing the day
to a 4
gives me different results.
EDIT: Not working
The function works but for only days since 2017. If I apply the following: it does not work.
url <- "https://www.timeanddate.com/weather/spain/madrid/historic?hd=20100109"
temps_obtained <- url %>%
html_session() %>%
read_html() %>%
html_node("table") %>%
html_table(fill = TRUE)
Which gives me:
1 High
2 Low
3 Average
4 * Reported Oct 27 6:00 pm — Nov 11 6:30 pm, Madrid. Weather by CustomWeather, © 2019
Temperature
1 72 °F (Oct 31, 3:30 pm)
2 39 °F (Nov 8, 8:00 am)
3 56 °F
4 * Reported Oct 27 6:00 pm — Nov 11 6:30 pm, Madrid. Weather by CustomWeather, © 2019
Humidity
1 100% (Oct 29, 7:30 am)
2 36% (Nov 8, 3:00 pm)
3 69%
4 * Reported Oct 27 6:00 pm — Nov 11 6:30 pm, Madrid. Weather by CustomWeather, © 2019
Pressure
1 30.27 "Hg (Oct 29, 7:30 am)
2 29.62 "Hg (Nov 3, 7:00 am)
3 30.00 "Hg
4 * Reported Oct 27 6:00 pm — Nov 11 6:30 pm, Madrid. Weather by CustomWeather, © 2019
Which is not the data I need.