92

I have a very large zip file and i am trying to read it into R without unzipping it like so:

temp <- tempfile("Sales", fileext=c("zip"))
data <- read.table(unz(temp, "Sales.dat"), nrows=10, header=T, quote="\"", sep=",")

Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") :
  cannot open zip file 'C:\Users\xxx\AppData\Local\Temp\RtmpyAM9jH\Sales13041760345azip'
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
laiboonh
  • 1,377
  • 1
  • 9
  • 19
  • This post may help you: http://stackoverflow.com/questions/3053833/using-r-to-download-zipped-data-file-extract-and-import-data – Sam Sep 17 '12 at 14:58
  • Yes i did my due diligence and searched before i asked this question, slightly different from that question is i am trying to read in from my local filesystem and not through a url. – laiboonh Sep 18 '12 at 02:17
  • Did you ever solve this problem? – Jon M May 16 '14 at 20:28
  • What version of R are you using? It may be worth trying the latest stable release (from the project, not from a distribution, which can be behind). I have seen this error occur in older releases but not the latest one, when running identical commands using `unz` in both. – gcbenison Feb 13 '13 at 06:10

8 Answers8

64

If your zip file is called Sales.zip and contains only a file called Sales.dat, I think you can simply do the following (assuming the file is in your working directory):

data <- read.table(unz("Sales.zip", "Sales.dat"), nrows=10, header=T, quote="\"", sep=",")
plannapus
  • 18,529
  • 4
  • 72
  • 94
  • Is there a way to find the filenames inside the "Sales.zip" file without extracting it? – Allen Wang Jul 21 '17 at 15:51
  • 7
    @AllenWang yes but one has to use function `unzip` instead: `unzip("Sales.zip", list=TRUE)` – plannapus Jul 22 '17 at 07:12
  • doing `readLines(unz("test.zip","file.txt"))` for some reason does not read the last line. Any idea how I can fix this? – Omar Wagih May 27 '20 at 14:44
  • @by0 Honestly, no. For me it works as expected. Maybe you should open a new question with your specific problem? Make sure to give a reproducible example. – plannapus May 28 '20 at 06:59
25

The methods of the readr package also support compressed files if the file suffix indicates the nature of the file, that is files ending in .gz, .bz2, .xz, or .zip will be automatically uncompressed.

require(readr)
myData <- read_csv("foo.txt.gz")
Holger Brandl
  • 10,634
  • 3
  • 64
  • 63
  • 2
    I had no idea! Here I've been extracting my zip files first with `unzip` and then using `readr`. Thanks for making my code much more efficient and reducing execution time! – StatsStudent Nov 12 '20 at 00:45
  • I tried this on a zip file that did not have the correct ending, at it seems that read_csv() makes a big effort to figure out the file type for uncompressing, because it all worked on the first try. It even guessed the right file to read from the archive without me specifying it (it was the only CSV file in the archive, and the biggest file, not sure what went into the heuristics to make the determination, and it gave me a clear message that it was guessing). – Magnus Jul 20 '22 at 10:29
22

No need to use unz, as now read.table can handle the zipped file directly:

data <- read.table("Sales.zip", nrows=10, header=T, quote="\"", sep=",")

See this post

Community
  • 1
  • 1
user5496072
  • 229
  • 2
  • 3
8

This should work just fine if the file is sales.csv.

data <- readr::read_csv(unzip("Sales.zip", "Sales.csv"))

To check the filename without extracting the file. This works

unzip("sales.zip", list = TRUE)
Smart D
  • 131
  • 1
  • 3
  • 1
    Thanks. However, only work for me when zip-file downloaded locally (Alt2 below), not when using url (Alt1 below). Any clue? Alt1: `data2 <- readr::read_csv(unzip("https://www.bis.org/statistics/full_BIS_DER_csv.zip", "WEBSTATS_DER_DATAFLOW_csv_col.csv"))` Alt2: `data <- readr::read_csv(unzip("C:/Documents/full_BIS_DER_csv.zip", "WEBSTATS_DER_DATAFLOW_csv_col.csv"))` – Dagfinn Rime Apr 15 '21 at 09:14
  • Using `tempfile()` and `download.file("https://www.bis.org/statistics/full_BIS_DER_csv.zip",temp)` before the `read_csv` fixed it – Dagfinn Rime Apr 15 '21 at 10:31
2

If you have zcat installed on your system (which is the case for linux, macos, and cygwin) you could also use:

zipfile<-"test.zip"
myData <- read.delim(pipe(paste("zcat", zipfile)))

This solution also has the advantage that no temporary files are created.

Holger Brandl
  • 10,634
  • 3
  • 64
  • 63
2

The gzfile function along with read_csv and read.table can read compressed files.

library(readr)
df = read_csv(gzfile("file.csv.gz"))

library(data.table)
df = read.table(gzfile("file.csv.gz"))

read_csv from the readr package can read compressed files even without using gzfile function.

library(readr)  
df = read_csv("file.csv.gz")

read_csv is recommended because it is faster than read.table

Natheer Alabsi
  • 2,790
  • 4
  • 19
  • 28
2

In this expression you lost a dot

temp <- tempfile("Sales", fileext=c("zip"))

It should be:

temp <- tempfile("Sales", fileext=c(".zip"))
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
0

For remote-based zipped files

samhsa2015 <- fread("curl https://www.opr.princeton.edu/workshops/Downloads/2020Jan_LatentClassAnalysisPratt_samhsa_2015F.zip | funzip")

answer from here: https://stackoverflow.com/a/37824192/12387385)

DmitriBolt
  • 355
  • 5
  • 5