Speed up data reading in R

Asked Oct 01 '16 at 01:24

Active Oct 01 '16 at 01:24

Viewed 23 times

I have the following line in my R code:

pipeline=sprintf("cut -f %i-%i %s", jcol1, jcol2, fname)
Ys <- as.matrix(read.table(pipe(pipeline)))

which takes about 3 seconds. Now, when I hardcode the associated cut line into my Linux terminal and pipe to /dev/null as:

time cut -f 2-5000 filename.txt > /dev/null

I find that this takes 0.631 seconds which tells me that it does not take long to actually read the file.

What aspect of my R code is taking so long and how can I improve the speed on this?

asked Oct 01 '16 at 01:24

drjrm3

3

I am not a linux expert but my guess would be that read.table and as.matrix would add some overhead as R does a lot of processing to "understand" the data, determine the right fields, read into memory etc. – Rohit Das Oct 01 '16 at 01:31
1

For starters you might try `readr::read_table` which will be faster than `read.table`. That being said, I think that Rohit Das's comment is the reason you're seeing a relative slowdown – Jacob H Oct 01 '16 at 01:32
Try comparing with: `library(data.table); fread(pipeline)` – G. Grothendieck Oct 01 '16 at 02:29
or system cache could be affecting the read rates. Try running each 2-3 times in both environments. Good luck. – shellter Oct 01 '16 at 04:33

0 Answers0