12

I am on a windows machine trying to speed up the read.table step. My files are all .gz.

x=paste("gzip -c ",filename,sep="")
phi_raw = fread(x)

Error in fread(x) : 

Cannot understand the error . Its a bit too cryptic for me.

Not a duplicate as suggested by zx8754: using specifically in the context of fread. And while fread dows not have native support for gzip, this paradigm should work. See http://www.molpopgen.org/coding/datatable.html

Update

Per suggestion below using system yields a longer error message - though i am still stuck.

Error in fread(system(x)) : 

  'input' must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself

In addition: Warning message:


running command 'gzip -c D:/x_.gz' had status 1

Update

Running with gunzip as pointed out below:

Error in fread(system(x)) : 

  'input' must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself

In addition: Warning message:

running command 'gunzip -c D:/XX_.gz' had status 127

note the different status

Andrie
  • 176,377
  • 47
  • 447
  • 496
pythOnometrist
  • 6,531
  • 6
  • 30
  • 50
  • https://github.com/Rdatatable/data.table/issues/717 – zx8754 Jun 09 '16 at 13:56
  • Possible duplicate of [Decompress gz file using R](http://stackoverflow.com/questions/5764499/decompress-gz-file-using-r) – zx8754 Jun 09 '16 at 13:57
  • 1
    Not a duplicate: using specifically in the context of fread. And while fread dows not have native support for gzip, this paradigm should work. – pythOnometrist Jun 09 '16 at 14:29
  • Where is the error message? How about `fread(system(x))`? – zx8754 Jun 09 '16 at 14:35
  • Are we assured that your installation of Windoze has access to gzip and gunzip? Also noting that the cited article used gunzip rather than gzip. – IRTFM Jun 09 '16 at 14:39
  • Thanks - That did help in at least yielding the full error. See edit above. – pythOnometrist Jun 09 '16 at 14:39
  • gzip is certainly installed. and in the path. However - not sure what the status 1 is about in the error message. Same results with gunzip as well. – pythOnometrist Jun 09 '16 at 14:44
  • Are you all using windows? This is what I get when I try fread(file= "gzip -cd input.gz'): "Provided file 'gzip -cd input.gz' does not exists." – Ricardo Guerreiro Jan 11 '19 at 12:07
  • please have a look at [Read and write csv.gz file in R](https://stackoverflow.com/questions/20609758/read-and-write-csv-gz-file-in-r) – fc9.30 Jul 08 '20 at 11:54

2 Answers2

9

data.table now supports reading .gz files directly with the fread function, provided that the R.utils package is installed.

As suggested in this answer to a similar question, you can simply run the following:

library(data.table)
phi_raw <- fread("filename.gz")
Lino Ferreira
  • 435
  • 4
  • 13
6

I often use gzip with fread on Windows. It reads in the files without decompressing them. I would try adding the -d option with the gzip command. Specifically, in your code, try x=paste("gzip -dc ",filename,sep=""). Here is a reproducible example that works on my machine:

df <- data.frame(x = 1:10, y = letters[1:10])
write.table(df, 'df.txt', row.names = F, quote = F, sep = '\t')
system("which gzip")
system("gzip df.txt")
data.table::fread("gzip -dc df.txt")

And here is my sessionInfo().

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] rsconnect_0.4.3  tools_3.3.1      data.table_1.9.6 chron_2.3-47 

I have successfully used gzip on Windows without adding a decompressed file to my hard drive using both Rtools (https://cran.r-project.org/bin/windows/Rtools/) and Gow (https://github.com/bmatzelle/gow/wiki). If my reproducible example above does not work for you, use the which gzip and which gunzip commands to see the exact .exe that is running. If it is not Rtools or Gow, perhaps try installing one of those two and trying the reproducible example again.

jmuhlenkamp
  • 2,102
  • 1
  • 14
  • 37