79

I have used ?unzip in the past to get at contents of a zipped file using R. This time around, I am having a hard time extracting the files from a .gz file which can be found here.

I have tried ?gzfile and ?gzcon but have not been able to get it to work. Any help you can provide will be greatly appreciated.

Btibert3
  • 38,798
  • 44
  • 129
  • 168

6 Answers6

77

Here is a worked example that may help illustrate what gzfile() and gzcon() are for

foo <- data.frame(a=LETTERS[1:3], b=rnorm(3))
foo
#  a        b
#1 A 0.586882
#2 B 0.218608
#3 C 1.290776
write.table(foo, file="/tmp/foo.csv")
system("gzip /tmp/foo.csv")             # being very explicit

Now that the file is written, instead of implicit use of file(), use gzfile():

read.table(gzfile("/tmp/foo.csv.gz"))   
#  a        b
#1 A 0.586882
#2 B 0.218608
#3 C 1.290776

The file you point is a compressed tar archive, and as far as I know, R itself has no interface to tar archives. These are commonly used to distribute source code--as for example for R packages and R sources.

stas g
  • 1,503
  • 2
  • 10
  • 20
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
62

To un-gz a file in R you can do

library(R.utils)
gunzip("file.gz", remove=FALSE)

or

gunzip("file.gz")

But then you get the default (remove=TRUE) behavior in which the input file is removed after that the output file is fully created and closed.

Robert Hijmans
  • 40,301
  • 4
  • 55
  • 63
  • 17
    Thats what i was looking for. Be aware: `NOTE: The default (remove=TRUE) behavior is that the input file is removed after that the output file is fully created and closed.` - see `?gunzip` – Rentrop Dec 27 '16 at 10:33
  • `gunzip()` is now deprecated – iskandarblue Jun 03 '23 at 17:29
  • 1
    @iskandarblue where do you see that? Can't find anywhere saying that gunzip is deprecated. – Taylor H Jul 20 '23 at 00:03
42

If you really want to uncompress the file, just use the untar function which does support gzip. E.g.:

untar('chadwick-0.5.3.tar.gz')
daroczig
  • 28,004
  • 7
  • 90
  • 124
28

http://blog.revolutionanalytics.com/2009/12/r-tip-save-time-and-space-by-compressing-data-files.html

R added transparent decompression for certain kinds of compressed files in the latest version (2.10). If you have your files compressed with bzip2, xvz, or gzip they can be read into R as if they are plain text files. You should have the proper filename extensions.

The command...

myData <- read.table('myFile.gz')  

#gzip compressed files have a "gz" extension

Will work just as if 'myFile.gz' were the raw text file.

WCC
  • 1,922
  • 1
  • 14
  • 7
  • 1
    It does work unless you specify colClasses argument. If you add myData <- read.table('myFile.gz', colClasses=c("character", "integer")) then you will get an error (as of R 3.2.0). Crap. – Met Jun 12 '15 at 16:58
2
library(vroom)
columns3 = c('A', 'B',...) ## define column names
Data1<- vroom(".../XXX.tsv",col_names = columns3)

works fine with tsv.gz

iHermes
  • 314
  • 3
  • 12
2

If it's a comma/tab-separated file, you can use data.table's fread(). It can handle zipped (.zip, .gz) files:

fread('myFile.csv.gz')
andschar
  • 3,504
  • 2
  • 27
  • 35