R Reading in a zip data file without unzipping it (loss of information)

Question

From one of the answers in this SO question, I got the lines:

require(readr)
myData <- read_csv("foo.txt.gz")

But this makes me lose data for some reason.

My second column is a time column in this format: 9:30:00.244271971 And this code transforms it into: 09:30:00, hence losing a lot of information.

How can I fix this? Is there a way to avoid losing this information?

Take a closer look into `?read_csv`; `col_types` could be an option — jay.sf, Feb 15 '18 at 19:47

score 2 · Accepted Answer · answered Feb 15 '18 at 21:32

2

You can also always use fread() from data.table. You can execute arbitrary shell commands from the file argument to handle the unzip, and it won't auto coerce your timestamps by default either, so you shouldn't have the truncation issue. The vignette Convenience features of fread has some great examples.

(Bonus, it's significantly faster than reader, and absolutely blows it out of the water if you install the development v1.10.5 version off github with multi-threading in fread.\

library(data.table)

myData <- fread("gunzip -c foo.txt.gz")

answered Feb 15 '18 at 21:32

Matt Summersgill

4,054
18
47

Yes, I had tried that before, but I am not using Linux (using Windows instead, so I had to do: >library(R.utils) >quote <- fread(gunzip('foo.txt.gz',remove=FALSE),sep='auto', header = TRUE) But then I get an unzipped file, which I don't want. – python_enthusiast Feb 15 '18 at 21:56
Ahh, I forget some of the friction that comes with R and Windows at times. [This post](https://stackoverflow.com/questions/37727865/how-can-i-use-fread-to-read-gz-files-in-r) might be helpful on formatting. – Matt Summersgill Feb 15 '18 at 22:04

score 1 · Answer 2 · answered Feb 15 '18 at 21:13

1

@jaySf comment turned out to work perfectly. So here is the answer:

(I had 5 columns where the first four were characters and the last one was a number.)

myData <- read_csv("foo.txt.gz", col_types = list("c","c","c","c","n"))

answered Feb 15 '18 at 21:13

python_enthusiast

896
2
7
26

R Reading in a zip data file without unzipping it (loss of information)

2 Answers2