0

I am hoping someone would be able to help me figure out how to scrape a .csv file that does not have a link.

Clicking the Download button in R

I would like to have R download the .csv file that is generated when clicking the 'Download dataset' next to the first table on this website https://www.opentable.com/state-of-industry. The closest post I found to my problem is this, but I cannot find the API link that is used in the solution.

Potential Second Question: Saving the downloaded file to another location

Ideally, I would like the file to be loaded in R (similar to what the solution in the link above does), but if the only way is to download it on my device and then read it in R, then I would like the .csv file to be installed in a specific folder (e.g. C:\Documents\OpenTable) and overwrite the existing file with the same name.

Thanks!

Ham
  • 3
  • 2
  • Hi @Ham, Welcome to StackOverflow. When you say you can't find the API link, do you mean ```jsonlite``` or are you asking about something else? – Russ Thomas Sep 15 '20 at 21:53
  • Hi @RussThomas, yes, I was asking about `jsonlite`. I am very new to scraping so all of this is foreign to me. Thank you for clarifying! – Ham Sep 16 '20 at 14:27
  • Hi @Ham. Have you tried to install ```jsonlite``` and are having issues with the install - i.e. ```install.packages('jsonlite')``` ? – Russ Thomas Sep 16 '20 at 14:51

1 Answers1

1

That's because this page doesn't call any API, all the data in the CSV file is in the JS code on the page. You will find it at the <script> tag that contains covidDataCenter. To convert the data created in JS to the data in R, you want the V8 package. Then, do some transformations with the data:

library(rvest)
library(V8)
library(dplyr)
library(tidyr)
pg <- read_html("https://www.opentable.com/state-of-industry")
js <- pg %>% html_node(xpath = "//script[contains(., 'covidDataCenter')]") %>% html_text()
ct <- V8::new_context()
ct$eval("var window = {}") # the JS code creates a `window` object that we need to initialize first
ct$eval(js)
data <- ct$get("window")$`__INITIAL_STATE__`$covidDataCenter$fullbook # this is where the data sets get values
dates <- data$headers
countries <- data$countries 
states <- data$states
cities <- data$cities
# ALthough it's not straight-forward but you can achieve the datasets you want by this:
countries_df <- countries %>%
  unnest(yoy) %>%
  group_by(name, id, size) %>%
  mutate(
    date = dates
  ) %>%
  ungroup() %>%
  spread(date, yoy) %>%
  .[c("name", "id", "size", dates)] # arrange the columns
# similar to states and cities

Export the data frame to CSV file by write.csv().

xwhitelight
  • 1,569
  • 1
  • 10
  • 19