Memory leakage in python - iterating over pandas

Question

I am trying to implement an algorithm to process temporal data.

def calculate_frequency(T, Wth):
    k = len(T)
    df = pd.concat(T).sort_values('time')
    frequency = 0
    # some operations commented out that calculate the frequency from T and Wth
    del df
    gc.collect()
    return frequency

T is a dictionary of time series. A time series is represented as a pandas DataFrame with columns 'alert' and 'time'.

I call this function repetitively for different T. Surprisingly to me, all the program occupies more and more memory. Any idea how to cope with that?

My attempts so far: deletion of df and calling garbage collector.

The manipulations themselves are commented out. They do not impact the memory.

import os
import psutil
import numpy as np
import pandas as pd
import gc
process = psutil.Process(os.getpid())
n_data = 10000
n_alert = 40
alert = np.random.randint(0,n_alert,size=n_data).tolist()
time = np.random.rand(n_data).tolist()
df = pd.DataFrame(dict(alert=alert,time=time))
t_all = {}
for a in range(n_alert):
    t_all[a] = df[df['alert']==a]

def calculate_frequency(T):
    df = pd.concat(T).sort_values('time')
    frequency = 0
    # some operations commented out that calculate the frequency from T and Wth
    del df
    gc.collect()
    return frequency

for a in range(n_alert):
    for a2 in range(a):
        T = [t_all[a],t_all[a2]]
        calculate_frequency(T)
        print(process.memory_info().rss)

How are you measuring the memory usage? Show that – Chris_Rands Sep 19 '17 at 12:44 — Chris_Rands, Sep 19 '17 at 12:44
I am just looking into the Task Manager. – Karel Macek Sep 19 '17 at 12:45 — Karel Macek, Sep 19 '17 at 12:45
Read https://stackoverflow.com/help/mcve – Chris_Rands Sep 19 '17 at 12:49 — Chris_Rands, Sep 19 '17 at 12:49

Memory leakage in python - iterating over pandas

0 Answers0