I am trying to implement an algorithm to process temporal data.
def calculate_frequency(T, Wth):
k = len(T)
df = pd.concat(T).sort_values('time')
frequency = 0
# some operations commented out that calculate the frequency from T and Wth
del df
gc.collect()
return frequency
T
is a dictionary of time series. A time series is represented as a pandas DataFrame with columns 'alert'
and 'time'
.
I call this function repetitively for different T
. Surprisingly to me, all the program occupies more and more memory. Any idea how to cope with that?
My attempts so far: deletion of df
and calling garbage collector.
The manipulations themselves are commented out. They do not impact the memory.
import os
import psutil
import numpy as np
import pandas as pd
import gc
process = psutil.Process(os.getpid())
n_data = 10000
n_alert = 40
alert = np.random.randint(0,n_alert,size=n_data).tolist()
time = np.random.rand(n_data).tolist()
df = pd.DataFrame(dict(alert=alert,time=time))
t_all = {}
for a in range(n_alert):
t_all[a] = df[df['alert']==a]
def calculate_frequency(T):
df = pd.concat(T).sort_values('time')
frequency = 0
# some operations commented out that calculate the frequency from T and Wth
del df
gc.collect()
return frequency
for a in range(n_alert):
for a2 in range(a):
T = [t_all[a],t_all[a2]]
calculate_frequency(T)
print(process.memory_info().rss)