The data frames are 15k rows x 200k columns. It's the first time I try to write this to a TSV file, and I am surprised to see how slow my code is. It takes three days and is still running. This is unacceptable. What techniques can I use to reduce writing time?
I know it is quick to write out in R objects, but I have to send this data to another person, who does not use R. Therefore the common format we can use is plain text file.
Confirmation
I confirm that write_csv
from readr
package does write my files much faster than the base write.table
function. However, it does not let me specify the separator I want, so it is not preferred for my case. I ended up using this trick: first I preprocess my huge data frame to a character vector like this:
forwriteout <- apply(mydf, 1, function(x){paste(x, collapse = "\t")})
And then I write out forwriteout
with the base write
function. This is almost as fast as write_csv
. See the benchmark below.
expr min lq mean median uq
pasteandwrite 281.8968 283.4586 288.5968 289.2780 292.2049
normalwritetable 1973.7250 1981.6122 1999.1016 1997.5792 2014.2397
usewritecsvfromreadr 201.6592 202.6115 215.2030 216.4946 226.1103
max neval
295.6102 10
2028.3227 10
229.3069 10