I have a program that outputs lines of CSV data that I want to load into a data frame. I currently load the data like so:
tmpFilename <- "tmp_file"
system(paste(procName, ">", tmpFilename), wait=TRUE)
myData <- read.csv(tmpFilename) # (I also pass in colClasses and nrows for efficiency)
However, I thought redirecting the output to a file just to read from it was inefficient (the program spits out about 30MB, so I want to handle it with optimal performance). I thought textConnection
would solve this, so I tried:
con <- textConnection(system(procName, intern=TRUE))
myData <- read.csv(con)
This runs a lot slower, though, and whereas the first solution degrades linearly with input size, the textConnection
solution's performance degrades exponentially it seems. The slowest part is creating the textConnection
. read.csv
here actually completes quicker than in the first solution since it's reading from memory.
My question is then, is creating a file just to run read.csv
on it my best option with respect to speed? Is there a way to speed up the creation of a textConnection? bonus: why is creating a textConnection so slow?