I'm trying to scrape data from TripAdvisor search results that span several pages using rvest.
Here's my code:
library(rvest)
starturl <- 'https://www.tripadvisor.co.uk/Search?q=swim+with&uiOrigin=trip_search_Attractions&searchSessionId=CA54193AF19658CB1D983934FB5C86F41511875967385ssid#&ssrc=A&o=0'
swimwith <- read_html(starturl)
swdf <- swimwith %>%
html_nodes('.title span') %>%
html_text()
It works fine for the first page of results, but I can't figure out how to get results from the subsequent pages. I noticed that the end of the url denotes the start position of the results, so I changed it from '0' to '30' as follows:
url <- sub('A&o=0', paste0('A&o=', '30'), starturl)
webpage <- html_session(url)
swimwith <- read_html(webpage)
swdf2 <- swimwith %>%
html_nodes('.title span') %>%
html_text()
However, the results for swdf2
are the same as swdf
even though the url loads the second page of results in a web browser.
Any idea how I can get the results from these subsequent pages?