0

I am trying to automate the download of a dataset from a website but am having trouble getting what I want. I have tried using RCurl but it is getting stuck with a tlsv1 alert protocol version error. I can execute the download with httr, but what I receive is the file in plain/html, which obviously isn't what I want. I have tried a handful of other things, but nothing seems to be working. Please advise.

Code for downloading with httr:

###lung cancer screening locator tool url
url1 = "https://report.acr.org/#/site/PUBLIC/views/NRDRLCSLocator/ADownload.csv"

GET(url1, write_disk(tf <- tempfile(fileext = ".csv"))) #produces file of content type 'plain/html'

lcsr = read.csv(tf)

The original website for this request is https://www.acr.org/Clinical-Resources/Lung-Cancer-Screening-Resources/LCS-Locator-Tool and the Tableau behind it is located at https://report.acr.org/t/PUBLIC/views/NRDRLCSLocator/LCSLocator?:embed=y&:showVizHome=no&:host_url=https%3A%2F%2Freport.acr.org%2F&:embed_code_version=3&:tabs=no&:toolbar=no&:showAppBanner=no&:display_spinner=no&:loadOrderID=0

Todd Burus
  • 963
  • 1
  • 6
  • 20
  • 1
    Did you try `download.file` https://stat.ethz.ch/R-manual/R-devel/library/utils/html/download.file.html? EDIT - And if you weren't aware `read.csv`/`read_csv` can read the url directly – Quixotic22 Sep 17 '21 at 08:44
  • I think the url has a redirect: it should work with this https://report.acr.org/t/PUBLIC/views/NRDRLCSLocator/ADownload.csv. As @Quixotic22 said you can use `read.csv(url)` if you want to read in on the fly. – Eyayaw Sep 17 '21 at 08:59
  • 1
    The webser does not provide the csv as plain text at this URL. There is javascript involved which has to be executed. Have a look at RSelenium to automate anything which can be done in a browser. There is also a ssl cert issue so you need to do `wget --no-check-certificate https://report.acr...` in bash. – danlooo Sep 17 '21 at 09:56
  • @danlooo Okay, I'll check that. – Todd Burus Sep 17 '21 at 11:19
  • @quixotic22 yes, I am well aware of the options in Base R, but neither can access the file due to it being HTTPS. – Todd Burus Sep 17 '21 at 11:20
  • How does one generate the csv link? It seems from the tool/workbook link one must enter a zip code to get any data. I see no csv export unless that appears after entering a zip code? – QHarr Sep 17 '21 at 23:48
  • 1
    @QHarr If you select "State" in the dropdown, it appears. – Todd Burus Sep 18 '21 at 05:06

1 Answers1

1

A RSelenium solution,

Set the download directory as per this,

library(RSelenium)

#Setting download directory, 
eCaps <- list(
  chromeOptions = 
    list(prefs = list('download.default_directory' = "D:\\mywork"))
)
driver <- rsDriver(browser = "chrome", extraCapabilities = eCaps)
remDr <- driver[["client"]]
remDr$navigate("https://report.acr.org/#/site/PUBLIC/views/NRDRLCSLocator/ADownload.csv")
library(readr)
df = read_csv('ADownload.csv')
Nad Pat
  • 3,129
  • 3
  • 10
  • 20