1

Uinsg rhdfs it's possible to read files from HDFS in R. However, my files on HDFS are gzip-compressed. How do I read the plain text content of gzipped files on HDFS into R?

Things I've tried so far:

gzcon(hdfs.file(path)) -> Error in gzcon(...) : 'con' is not a connection

memDecompress(hdfs.read(hdfs.file(path)), type="g") -> Error in memDecompress(..., type = "g") : internal error -3 in memDecompress(2)

Also tried memDecompress with type="u".

Josh Hansen
  • 1,408
  • 2
  • 17
  • 20
  • Does [this post](http://stackoverflow.com/questions/5764499/decompress-gz-file-using-r) help? – Arun Feb 25 '13 at 21:53
  • Unfortunately not, because my files aren't .tar.gz files, just .gz. Using `untar` on a plain .gz file gives /bin/gtar: This does not look like a tar archive /bin/gtar: Skipping to next header /bin/gtar: Exiting with failure status due to previous errors – Josh Hansen Feb 25 '13 at 22:22
  • How about the second answer from Dirk in the same post I've linked? – Arun Feb 25 '13 at 22:24
  • 1
    That doesn't work either---`gzfile` takes a path or url or clipboard or `stdin`. None of these options reasonably allows interfacing with HDFS. – Josh Hansen Feb 25 '13 at 22:48

0 Answers0