0

I would like to unzip a file on my network that is a ".gz" winzip file and inside there is a .cax file I want to read.

when I run

  unzip("/connect/me/test.gz", files = "test.cax",   list = TRUE, overwrite = TRUE,junkpaths = FALSE, exdir = ".", unzip = "internal")

here I get an error

  Error in unzip("/connect/me/test.gz", files = NULL,  : 
    zip file 'connect/me/test.gz' cannot be opened
######################################## I have also tried this:
 g<-gzfile("/connect/me/test.gz", open = "", encoding = getOption("encoding"),compression = 6)
 g

and this returns

   description                                     class                                      mode 
 "/connect/me/test.gz"                                  "gzfile"                                           "rb" 
                                 text                                    opened                                      can read 
                               "text"                                  "closed"                                        "yes" 
                            can write 
                                "yes" 
so can anyone help. looks like gzfile has potential but how do I read the file?

Thank you.

user3022875
  • 8,598
  • 26
  • 103
  • 167

1 Answers1

0

BTW: a *.gz file, though it may be handled by WinZip in Windows, is better described as a gzipped file. gzip itself only compresses single files, it does not produce an archive of multiple files or directories. As such, when you figure out how to decompress it, you will have the uncompressed raw file as a result (with or without your .cax suffix, it technically does not matter).

gzfile merely returns a connection that can be used by other functions such as read.csv() and readLines(). The output you are seeing (with "description," etc) is merely how an objective of class "connection" is printed on the console. From help(gzfile):

'file', 'pipe', 'fifo', 'url', 'gzfile', 'bzfile', 'xzfile', 'unz' and 'socketConnection' return a connection object which inherits from class '"connection"' and has a first more specific class.

This should then be used with whatever function you need to read the data. For instance, if you had a gzip'ed CSV, you could do:

dat <- read.csv(gzfile('/path/to/test.csv.gz', header = FALSE))

So, to extend your example:

g <- gzfile("/connect/me/test.gz", open = "",
            encoding = getOption("encoding"), compression = 6)
txt <- readLines(g)
close(g)              # not strictly required but good practice

I'm guessing you have a different function for reading the uncompressed file. Use that in place of readLines() above.

BTW: it is uncompressing it in memory, meaning that you will not find an uncompressed file. If decompressing it in memory continues to cause problems or consternation, you can always decompress it on the command line with gunzip /connect/to/test.gz or, assuming your gz* commands are in your PATH, you can use the following in R:

system2('gunzip', '/connect/to/test.gz')
txt <- readLines('/connect/to/test')    # notice no ".gz"

(The compressed file is uncompressed into a new file and the compressed file is removed. The "delete when complete" behavior can be changed with -k on the command line. See man gunzip for more info if this is a preferred route.)

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • THank you. It is working now. Any chance you can look at this question: http://stackoverflow.com/questions/26788146/r-download-file-error-cannot-open-url – user3022875 Nov 06 '14 at 21:59
  • The most recent answer (needing three slashes) should solve your problem. – r2evans Nov 06 '14 at 22:26