38

I wish to download and open the following tar.gz file in R:

http://s.wordpress.org/resources/survey/wp2011-survey.tar.gz

Is there a command which can accomplish this?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Tal Galili
  • 24,605
  • 44
  • 129
  • 187
  • @Ramnath, much closer than joran's: maybe worth closing/merging ... – Ben Bolker Aug 22 '11 at 18:45
  • Sorry for the ultra duplicates. I searched a bit before posting - but apparently not enough. My apologies. – Tal Galili Aug 22 '11 at 21:49
  • For me archive_extract("tmp.tar.gz", files="wp2011-survey/anon-data.csv") from library(archive) is quite a bit faster than the in-built base R untar (especially for large archives) and it works very well on all platforms... You can also use it to read a csv directly from an archive without unpacking it using read_csv(archive_read("tmp.tar.gz", file = 3), col_types = cols()). It supports 'tar', 'ZIP', '7-zip', 'RAR', 'CAB', 'gzip', 'bzip2', 'compress', 'lzma' and 'xz' formats. So for me that would be the preferred option. – Tom Wenseleers Jul 11 '22 at 15:29

1 Answers1

63
fn <- "http://s.wordpress.org/resources/survey/wp2011-survey.tar.gz"
download.file(fn,destfile="tmp.tar.gz")
untar("tmp.tar.gz",list=TRUE)  ## check contents
untar("tmp.tar.gz")
## or, if you just want to extract the target file:
untar("tmp.tar.gz",files="wp2011-survey/anon-data.csv")
X <- read.csv("wp2011-survey/anon-data.csv")

Tom Wenseleers points out that the archive package can help with this:

library(archive)
library(readr)
read_csv(archive_read("tmp.tar.gz", file = 3), col_types = cols())

and that archive::archive_extract("tmp.tar.gz", files="wp2011-survey/anon-data.csv") is quite a bit faster than the in-built base R untar (especially for large archives) It supports 'tar', 'ZIP', '7-zip', 'RAR', 'CAB', 'gzip', 'bzip2', 'compress', 'lzma' and 'xz' formats.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • is it also possible to untar only a specific file inside a tarball?? I think the `files` argument in `untar` does this but am unsure how ?? Help appreciated .. – Ashwin Dec 08 '14 at 10:27
  • For me archive_extract("tmp.tar.gz", files="wp2011-survey/anon-data.csv") from library(archive) is quite a bit faster than the in-built base R untar (especially for large archives) and it works very well on all platforms... You can also use it to read a csv directly from an archive without unpacking it using read_csv(archive_read("tmp.tar.gz", file = 3), col_types = cols()). It supports 'tar', 'ZIP', '7-zip', 'RAR', 'CAB', 'gzip', 'bzip2', 'compress', 'lzma' and 'xz' formats. So for me that would be the preferred option. – Tom Wenseleers Jul 11 '22 at 15:29