download large zipped csv over https, unzip, and load

Question

I'm trying to follow this example to download a zipped file over https, extract the csv file (14GB), and load the data into a dataframe. I created a small example (<1MB).

library(data.table)
temp <- tempfile()
download.file("https://www.dropbox.com/s/h130oe03krthcl0/example.csv.zip",
              temp, method="curl")
data <- fread(unz(temp, "example.csv"))
unlink(temp)

Is my mistake obvious?

`R version 3.1.2 (2014-10-31)`; `Platform: x86_64-apple-darwin13.4.0 (64-bit)` — Eric Green, Sep 24 '15 at 00:16
As far as I know, `fread()` does not currently support the use of `unz()`. See [here](https://github.com/Rdatatable/data.table/issues/717). You will probably need to unzip the file unless there has been a solution posted in that thread — Rich Scriven, Sep 24 '15 at 00:22
You could try `unzip(temp, file = "example.csv"); fread("example.csv")` — Rich Scriven, Sep 24 '15 at 00:26
thanks. `unzip()` is not working for me. i've come across this error in other variants of what i've tried today. i can't pinpoint the reason. — Eric Green, Sep 24 '15 at 00:28
This might be irrelevant but the comment in this suggested edit of the question points out something interesting about the url you are using: https://stackoverflow.com/review/suggested-edits/9614251 — spenibus, Sep 24 '15 at 00:55
@spenibus that is interesting. i tried with the modification, but i still have the problem. — Eric Green, Sep 24 '15 at 00:59
You're not helping us out much here. Are you getting an error? What does it say? — Rich Scriven, Sep 24 '15 at 02:02

score 2 · Accepted Answer · answered Sep 24 '15 at 01:41

This works fine for me (download.file does too but I'm on 3.2.2 OS X so this is more "portable" given the updates to download.file since 3.1.2):

library(httr)

response <- GET("https://www.dropbox.com/s/h130oe03krthcl0/example.csv.zip?dl=1",
                write_disk("example.csv.zip"),
                progress())

fil <- unzip("example.csv.zip")
read.csv(fil[1], stringsAsFactors=FALSE)

##   v1 v2 v3
## 1  1  2  3
## 2  1  2  3
## 3  1  2  3

I didn't try it w/o the ?dl=1 (& I do that by wrote, not due to the edit queue suggestion).

Honestly, though, I'd probably spare the download in R and just use curl on the command line in an automated workflow for files the size you've indicated (and, I'd do that if the processing language was python [et al], too).

score 0 · Answer 2 · answered Mar 31 '20 at 07:06

In my of the application I was trying to download the zip file from http and just create stream for unzipping that file into a folder.

After making some google search I was able to write following code which helps me in my task

Here are few steps you have to follow

Install unzipper package
import unzipper and http into the code file

import unzipper from ‘unzipper’;

import http from ‘http’;

Now you have to download the zip file and create stream for this, here is the complete code

import unzipper from ‘unzipper’;

import http from ‘http’;

var self=this;

http.get(‘http://yoururl.com/file.zip’, function(res) {

         res.pipe(unzipper.Extract({ path: ‘C:/cmsdata/’ })).on(‘close’, function() {
                   //Here you can perform any action after completion of stream unzipping
         });

});

download large zipped csv over https, unzip, and load

2 Answers2