I am writing a simple command-line Rscript that reads some binary data and outputs it to as a stream of numeric characters. The data is of specific format and R has a very quick library to deal with the binary files in question. The file (of 7 million characters) is read quickly - in less than a second:
library(affyio)
system.time(CEL <- read.celfile("testCEL.CEL"))
user system elapsed
0.462 0.035 0.498
I want to write a part of read data to stdout:
str(CEL$INTENSITY$MEAN)
num [1:6553600] 6955 225 7173 182 148 ...
As you can see it's numeric data with ~6.5 million integers.
And the writing is terribly slow:
system.time(write(CEL$INTENSITY$MEAN, file="TEST.out"))
user system elapsed
8.953 10.739 19.694
(Here the writing is done to a file, but doing it to standard output from Rscript takes the same amount of time"
cat(vector)
does not improve the speed at all. One improvement I found is this:
system.time(writeLines(as.character(CEL$INTENSITY$MEAN), "TEST.out"))
user system elapsed
6.282 0.016 6.298
It is still a far cry from the speed it got when reading the data in (and it read 5 times more data than this particular vector). Moreover I have an overhead of transforming the entire vector to character before I can proceed. Plus when sinking to stdout I cannot terminate the stream with CTRL+C if by accident I fail to redirect it to file.
So my question is - is there a faster way to simply output numeric vector from R to stdout?
Also why is reading the data in so much faster than writing? And this is not only for binary files, but in general:
system.time(tmp <- scan("TEST.out"))
Read 6553600 items
user system elapsed
1.216 0.028 1.245