2

From one of the answers in this SO question, I got the lines:

require(readr)
myData <- read_csv("foo.txt.gz")

But this makes me lose data for some reason.

My second column is a time column in this format: 9:30:00.244271971 And this code transforms it into: 09:30:00, hence losing a lot of information.

How can I fix this? Is there a way to avoid losing this information?

python_enthusiast
  • 896
  • 2
  • 7
  • 26

2 Answers2

2

You can also always use fread() from data.table. You can execute arbitrary shell commands from the file argument to handle the unzip, and it won't auto coerce your timestamps by default either, so you shouldn't have the truncation issue. The vignette Convenience features of fread has some great examples.

(Bonus, it's significantly faster than reader, and absolutely blows it out of the water if you install the development v1.10.5 version off github with multi-threading in fread.\

library(data.table)

myData <- fread("gunzip -c foo.txt.gz")
Matt Summersgill
  • 4,054
  • 18
  • 47
  • Yes, I had tried that before, but I am not using Linux (using Windows instead, so I had to do: >library(R.utils) >quote <- fread(gunzip('foo.txt.gz',remove=FALSE),sep='auto', header = TRUE) But then I get an unzipped file, which I don't want. – python_enthusiast Feb 15 '18 at 21:56
  • Ahh, I forget some of the friction that comes with R and Windows at times. [This post](https://stackoverflow.com/questions/37727865/how-can-i-use-fread-to-read-gz-files-in-r) might be helpful on formatting. – Matt Summersgill Feb 15 '18 at 22:04
1

@jaySf comment turned out to work perfectly. So here is the answer:

(I had 5 columns where the first four were characters and the last one was a number.)

myData <- read_csv("foo.txt.gz", col_types = list("c","c","c","c","n"))
python_enthusiast
  • 896
  • 2
  • 7
  • 26