I am new to Python 3, coming over from R.
I have a very large time series file (10gb) which spans 6 months. It is a csv file where each row contains 6 fields: Date, Time, Data1, Data2, Data3, Data4. "Data" fields are numeric. I would like to iterate through the file and create & write individual files which contain only one day of data. The individual dates are known only by the fact that the date field suddenly changes. Ie, they don't include weekends, certain holidays, as well as random closures due to unforseen events so the vector of unique dates is not deterministic. Also, the number of lines per day is also variable and unknown.
I envision reading each line into a buffer and comparing the date to the previous date.
If the next date = previous date, I append that line to the buffer. I repeat this until next date != previous date, at which point I write the buffer to a new csv file which contains only that day's data (00:00:00 to 23:59:59).
I had trouble appending the new lines with pandas dataframes, and using readline into a list just got too mangled for me. Looking for Pythonic advice.