2

This seems like a simple problem but I've been struggling with it for a few days. This is a minimum working example rather than the actual problem:

This question seemed similat but I was unable to use the answer to solve my problem.

In a browser, I go to this url, and click on [Search] (no need to make any choices from the lists), and then on [Download Results] (choosing, for example, the Xlsx option). The file then downloads.

To automate this in R I have tried:

library(rvest)

url1 <- "https:/secure.gamblingcommission.gov.uk/PublicRegister/Search"
sesh1 <- html_session(url1)
form1    <-html_form(sesh1)[[1]]
subform <- submit_form(sesh1, form1)

Using Chrome Developer tools I find the url being used to initiate the download, so I try:

url2 <- "https:/secure.gamblingcommission.gov.uk/PublicRegister/Search/Download"
res <- GET(url = url2, query = list(format = "xlsx"))

However this does not download the file:

> res$content
  raw(0) 

I also tried

download.file(url = paste0(url2, "?format=xlsx") , destfile = "down.xlsx", mode = "wb")

But this downloads nothing:

> Content type '' length 0 bytes
> downloaded 0 bytes

Note that, in the browser, pasting url2 and adding the format query does initiate the download (after doing the search from url1)

I thought that I should somehow be using the session info from the initial code block to do the download, but so far I can't see how.

Thanks in advance for any help !

Joe King
  • 2,955
  • 7
  • 29
  • 43

1 Answers1

3

You are almost there and your intuition is correct about using the session info.

You just need to use rvest::jump_to to navigate to the second url and then write it to disk:

library(rvest)

url1 <- "https:/secure.gamblingcommission.gov.uk/PublicRegister/Search"
sesh1 <- html_session(url1)
form1    <-html_form(sesh1)[[1]]
subform <- submit_form(sesh1, form1)

url2 <- "https://secure.gamblingcommission.gov.uk/PublicRegister/Search/Download"

#### The above is your original code - below is the additional code you need:

download <- jump_to(subform, paste0(url2, "?format=xlsx"))
writeBin(download$response$content, "down.xlsx")
Robert Long
  • 5,722
  • 5
  • 29
  • 50