I am new to R but something like the following where you define a function to retrieve the row info as a dataframe from a given url. Loop over how ever many pages you want calling the function and merging returned dfs into one big df. As nodeLists are not always the same length e.g. not every listing has a telephone number, you need to test for whether element is present in a loop over the rows. I use the method in the answer by alistaire (+ to him)
I am using css selectors rather than xpath. You can read about them here.
Given the # of possible pages I would look into using an http session. You get the efficiency of re-using a connection. I use them in other languages; from a quick google it seems R provides this, for example, with html_session.
I would welcome suggestions for improvement and any edits for correcting indentation. I'm learning as I go.
library(rvest)
library(magrittr)
library(purrr)
url <- "https://channel9.msdn.com/Events/useR-international-R-User-conferences/useR-International-R-User-2017-Conference?sort=status&direction=desc&page="
get_listings <- function(url){
df <- read_html(url) %>%
html_nodes('.views-row') %>%
map_df(~list(
title = html_node(.x, '.service-card__title a')%>% html_text(),
location = trimws(gsub('\n', ' ',html_text(html_node(.x, '.service-card__address')))) %>%
{if(length(.) == 0) NA else .},
telephone = html_node(.x, '.service-card__phone') %>% html_text() %>%
{if(length(.) == 0) NA else .}
)
)
return(df)
}
pages_to_loop = 2
for(i in seq(1, pages_to_loop)){
new_url <- paste0(url, i, sep= '')
if(i==1){
df <- get_listings(new_url)
} else {
new_df <- get_listings(new_url)
df <- rbind(df, new_df)
}
}