I have some Time Series to work on. In particular, I have one univariate time series, saved in a .csv file, consisting in just a single column, and containing >1M rows. In fact, when I try to open that csv with Excel, I get the "cannot display all records" popup. I can just view 1048576 records. I use R and RStudio for analytics, so I tried to import this dataset into RStudio environment. Fun fact, i can only view exactly the same number of rows as i did using programs like Excel.
One simple workout I found, was to split the original csv file using the split
bash command. So:
split -l 500000 bigdata.csv
produced 4 smaller csv files (the first 3 files containing 500k records), which I easily managed to import in 4 different RStudio Time Series (that I finally merged, obtaining the wanted result).
My question is: there is something I can do to avoid all this process, and directly load such a dataset with no final rows loss??
I already tried the data.table
library, with the fread()
function to load the dataset, but there were no benefit: same number of rows were loaded.
I am using RStudio on a Windows 10 machine, with 6 GB of RAM.
EDIT: I tried memory.limit()
cmd to check the amount of memory avaiable to RStudio use. Result is "6072", corresponding to my 6 GB of RAM.