I read data from csv. It takes roughly 5Gb RAM (I am judging by the Jupyter notebook mem usage figure and by Linux htop).
df = pd.read_csv(r'~/data/a.txt', usecols=[0, 1, 5, 15, 16])
then I group it and modify resulting dataframes and delete df
.
df.set_index('Date')
y = df.groupby('Date')
days = [(key, value) for key,value in y]
del df
for day in days:
day[1].set_index('Time')
del day[1]['Date']
At this point I would expect groupby to double memory but after del df
to release half of it. But in fact it is using 9Gb.
How can I split dataframe by date without duping memory use?
EDIT: since it appeared that python does not release memory to OS, I had to use python memory_profiler
to find whats the actual memory use:
print(memory_profiler.memory_usage()[0])
407 << mem use
df = pd.read_csv
4362 <<
groupby and create days list
6351 <<
df = None
gc.collect()
6351 <<