0

I have the following line in my R code:

pipeline=sprintf("cut -f %i-%i %s", jcol1, jcol2, fname)
Ys <- as.matrix(read.table(pipe(pipeline)))

which takes about 3 seconds. Now, when I hardcode the associated cut line into my Linux terminal and pipe to /dev/null as:

time cut -f 2-5000 filename.txt > /dev/null

I find that this takes 0.631 seconds which tells me that it does not take long to actually read the file.

What aspect of my R code is taking so long and how can I improve the speed on this?

drjrm3
  • 4,474
  • 10
  • 53
  • 91
  • 3
    I am not a linux expert but my guess would be that read.table and as.matrix would add some overhead as R does a lot of processing to "understand" the data, determine the right fields, read into memory etc. – Rohit Das Oct 01 '16 at 01:31
  • 1
    For starters you might try `readr::read_table` which will be faster than `read.table`. That being said, I think that Rohit Das's comment is the reason you're seeing a relative slowdown – Jacob H Oct 01 '16 at 01:32
  • Try comparing with: `library(data.table); fread(pipeline)` – G. Grothendieck Oct 01 '16 at 02:29
  • or system cache could be affecting the read rates. Try running each 2-3 times in both environments. Good luck. – shellter Oct 01 '16 at 04:33

0 Answers0