2

I have an R package that requires large (> 100 MB) files. (These are weight files for a large neural network.) My plan is to have a function download these files from google drive (although I'm open to other options if they exist). I can't download no matter what I try. I don't want to use the googledrive package because I don't want users to need a google account. Note: This file has permissions set so that anyone with the link can access it. You can check here
I have tried following the directions here. This works for smaller files, but not for large files. I think this is because Google can't scan for viruses on files this big. Here is the code to follow that example. It downloads something that is 3250 bytes (not my file).

url <- "https://drive.google.com/uc?id=1_YtgWP2MAF7c4dW8naugP1RL3I-xB7G2"
temp <- tempfile(fileext = ".zip")
download.file(url, temp)

I have also tried using the curl and wget options, based on what I have found from using the command line tools explained here.

# I have tried both of these URLs (and some other options)
url <- "https://drive.google.com/uc?id=1_YtgWP2MAF7c4dW8naugP1RL3I-xB7G2"
url <- "https://drive.google.com/uc?export=download&id=1_YtgWP2MAF7c4dW8naugP1RL3I-xB7G2"

# I tried several options here too
download.file(url, temp,  mode='wb', 
              #method='wget', extra=list('no-check-certificate', getOption("download.file.extra"))
              method='curl', extra = 'insecure' #list("k", getOption("download.file.extra"), 'insecure')
              )

When I try the curl/wget options I get errors that the file is too large.

href="https://drive.google.com/open?id=1_YtgWP2MAF7c4dW8naugP1RL3I-xB7G2">test.zip (158M) is too large for Google to scan for viruses. Would you still like to download this file?

Is there a way to force this to download from R like I could from curl or wget? Or is there any good way to download a large zipped file from google drive without requiring the googledrive package? Or is there somewhere else I should store large files that are included in my R package.

UPDATE: I have decided instead to use dropbox to store the files. This is working seamlessly. I'm still interested if anyone has a solution to simply download big files from Google Drive within R.

mikey
  • 1,066
  • 9
  • 17
  • The url you gave is not the file itself, but a web page that contains a link to download the file. Fix the url and the problem should be resolved – Hong Ooi Dec 23 '21 at 18:27
  • Thank you, but I don't understand. That is what google drive gives me as the link to the file. How do I get the real link to the file? – mikey Dec 23 '21 at 18:30
  • When you copy that URL into a local browser, does it load the file or does it open a page that offers a link to download it? For me (and apparently HongOoi as well), it opens a page, and R is not going to automagically know to redirect the URL you explicitly passed to some link in the returned HTML. – r2evans Dec 23 '21 at 18:34
  • Replace your `url` with `"https://drive.google.com/u/0/uc?export=download&confirm=EoIm&id=1_YtgWP2MAF7c4dW8naugP1RL3I-xB7G2"` – r2evans Dec 23 '21 at 18:35
  • Thank you @r2evans, but when I try this I also download only a file of 3291 bytes. What magic do you use to get this URL? It is not the link they give you to share the file – mikey Dec 23 '21 at 18:38
  • 1
    I would not count on this kind of URL being reliable. The message you're seeing is on Google's servers filtering the request. A way around it might be to get get a Drive API key, and then create the URL that will point to: `https://www.googleapis.com/drive/v3/files/FileID?alt=media&key=APIKey` –  Dec 23 '21 at 18:41
  • Or if you are *really* averse to doing it "The Right Away according to The Google (tm)", then you may need to `rvest`-scrape the new URL from your original URL. – r2evans Dec 23 '21 at 18:59
  • When I do it the "right way" I get `insufficient permission`, and I'm logged in as the person who owns the file (and the file is open so that anyone with the link can edit). This makes me skeptical that this solution would work for every user. I'm hoping to find an option that would work for anybody who wants to use this. – mikey Dec 23 '21 at 19:04
  • 1
    Easier alternative might be to use any other hosting service that doesn't have this limitation? i.e. Free Dropbox? – Jav Dec 23 '21 at 19:33
  • I'm assuming you mean that when logged into the browser as the file owner, you click on the link in the browser (sorry to be redundant) and when you try to download it in the browser it says `insufficient permission`? If instead you mean *"logged in to the browser"* and "*in R I get `insufficient permissions`"*, the two are different. I've seen other answers that are able to download public datasets from google drive, so there is certainly a sustainable way to go here. – r2evans Dec 23 '21 at 19:55
  • 1
    Hi @r2evans, so I'll be in R and it asks for authentication when I run `drive_download`, then takes me to the browser, where I log in as the account that hosts the file. I recognize that others have gotten this to work, but as it is already problematic for me, I don't see this as a good option. My plan is to have this package used by people with no programming experience at all, so I'm hoping that I can do all of the work in R behind the scenes. – mikey Dec 23 '21 at 20:21

0 Answers0