I have a collection of large time series files (with tick data) in .txt.
format. I access these through R
for statistical analysis. I am exploring moving the files to a Database (e.g. MySQL
, Influx
). The SO answer here lays out the benefits of using a Database instead of flat files. I fail to see much value in the points raised there, maybe due to ignorance.
For those working with time series, what kind of requirements would justify the additional complexity of setting up a Database, instead of simply importing the flat files directly into R
/Python
/Matlab
?
Specific circumstances: Each file consists of no more than 10 columns (exact # depends on asset class) and in many cases millions of rows (sometimes less because I split them into smaller chunks). A few files get updated/rewritten in real time. I load the .txt
files into R
(either one at a time, or multiple files) at the beginning of a script, then create a number of binary vectors/signals, and perform statistical analyses/modelling. I do this on a computer with 128gb RAM, yet sometimes I run out of memory.
Thank you.