0

I am having trouble extracting the price element from the website:

"https://www.eventbrite.com/" using rvest

I have located the selector with Select Gadget and have the following minimal selector ".eds-l-mar-top-1" which I have used to locate the price. I have tried saving the xml data as a dataframe but I get the following error message:

Error in as.data.frame.default(page_html) : cannot coerce class ‘c("xml_document", "xml_node")’ to a data.frame

I have tried to filter the price with:

price <- page_html %>% html_nodes('js-display-price') %>% html_text()

but price is empty.

getYear = "2019"
getWeek = "31"

base_url = "https://www.eventbrite.com/"
query_params = list(yr=getYear, wk=getWeek)

resp <- GET(url=base_url, query=query_params)

page_html <- read_html(resp)

# price included in the details of the following tag
page_html %>% 
  html_nodes(".eds-l-mar-top-1") %>%
  html_text(trim = TRUE)

I would like to extract the following data:

Name and Date of Event and price

QHarr
  • 83,427
  • 12
  • 54
  • 101
EJG_27
  • 111
  • 10

1 Answers1

0

I see content which is dynamically loaded but is present within a javascript object elsewhere in the response. You can regex out the object and handle with a json parser.

library(httr)
library(rvest)
library(stringr)

getYear = "2019"
getWeek = "31"

base_url = "https://www.eventbrite.com/"
query_params = list(yr=getYear, wk=getWeek)

resp <- GET(url=base_url, query=query_params)

r <- read_html(resp) %>% 
  html_nodes('body') %>% 
  html_text() %>% 
  toString()

x <- str_match_all(r,'window\\.__SERVER_DATA__ = (.*);')  
json <- jsonlite::fromJSON(x[[1]][,2])
print(json$suggestions$events$ticket_availability)
print(json$suggestions$events)
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Thank you for that. 2 questions: 1) the regular expression (.*?);') is looking for what exactly in the SERVER_DATA? 2) do you always search columns 1 and 2 with fromJSON and a javascript object? – EJG_27 Aug 01 '19 at 10:25
  • It’s looking for the JavaScript object which is the server data stored. It gets pulled from there when JavaScript runs in browser and updated DOM. The indexing used on regex return depends on which group you are interested. – QHarr Aug 01 '19 at 13:11
  • My next question is how to flatten the json object into a datarame. I have tried various methods ranging from: my_df <- fromJSON((tmp[[1]][,2]), flatten=TRUE) to complex such as tmp %>% map(~ fromJSON(.x)) %>% bind_rows() to https://stackoverflow.com/questions/11553592/r-generic-flattening-of-json-to-data-frame. None seem to work for me. Any suggestions – EJG_27 Aug 01 '19 at 16:03