I'm using pandas.DataFrame
to store 3 hours of sensor data sampled at the second interval. So each second, I'm adding a row and dropping rows older than 3 hours.
Currently, I'm doing it very inefficiently:
record = pd.DataFrame.from_records([record], index='Date')
if self.data.empty:
#logger.debug('Creating data log')
self.data = record
else:
#logger.debug('Appending new record')
self.data = self.data.append(record)
start = now - self.keepInMemory
self.data = self.data[self.data.index > start]
Namely, a new DataFrame is created, then it is appended, and then old records are removed. It is slow, inefficient, and certainly does a lot of memory reallocation.
What I'm looking for is:
- Pre-allocated DataFrame
- Remove old records (without reallocation)
- Add new record
What is the most Panda-ish way to accomplish that?
Thank you.
P.s. The only relevant question on SO I managed to find was: deque in python pandas but it didn't help.
Update: Using DataFrame and not deque is a requirement, since other modules use self.data
as a service for computing generic conditions, e.g. ('is std() of last 15 minutes differs from this of the first' and similar). To stress, it's not just for recording data, it's for providing ability to for other modules to compute various generic conditions efficiently and conveniently.
I suspect there might be a clever play with indices (e.g. data.index = np.roll(data.index,1))
and then replacing the last row inplace, but until now I could not figure out how to do that efficiently. New record has the same format as the rest, so it should be possible.