1

I'm running a python script that handles and processes data using Pandas functions inside an infinite loop. But the program seems to be leaking memory over time.

This is the graph produced by the memory-profiler package: enter image description here

Sadly, I cannot identify the source of the increasing memory usage. To my knowledge, all data (pandas timeseries) are stored in the object Obj, and I track the memory usage of this object using the pandas function .memory_usage and the objsize function get_deep_size(). According to their output, the memory usage should be stable around 90-100 MB. Other than this, I don't see where memory can ramp up.

It may be useful to know that the python program is running inside a docker container.

Below is a simplified version of the script which should illuminate the basic working principle.

from datetime import datetime
from time import sleep
import objsize
from dateutil import relativedelta

def update_data(Obj, now_utctime):
    # attaining the newest timeseries data
    new_data = requests.get(host, start=Obj.data[0].index, end=now_utctime)
    Obj.data.append(new_data)

    # cut off data older than 1 day
    Obj.data.truncate(before=now_utctime-relativedelta.relativedelta(days=1))

class ExampleClass():
    def __init__(self):
        now_utctime = datetime.utcnow()
        data = requests.get(host, start=now_utctime-relativedelta.relativedelta(days=1), end=now_utctime)

Obj = ExampleClass()

while True:
    update_data(Obj, datetime.utcnow())
    logger.info(f"Average at {datetime.utcnow()} is at {Obj.data.mean()}")
    logger.info(f"Stored timeseries memory usage at {Obj.data.memory_usage(deep=True)* 10 ** -6} MB")
    logger.info(f"Stored Object memory usage at {objsize.get_deep_size(Obj) * 10 ** -6} MB")
    time.sleep(60)

Any advice into where memory could ramp up, or how to further investigate, would be appreciated.

EDIT: Looking at the chart, it makes sense that there will be spikes before I truncate, but since the data ingress is steady I don't know why it wouldn't normalize, but remain at a higher point. Then there is this sudden drop after every 4th cycle, even though the process does not have another, broader cycle that could explain this ...

moooeeeep
  • 31,622
  • 22
  • 98
  • 187
cheesus
  • 1,111
  • 1
  • 16
  • 44
  • 1
    It seems that `Obj.data` should be a list of some kind, and not a `requests.Response` object, right? – moooeeeep Dec 18 '19 at 11:12
  • the request call returns a pandas series – cheesus Dec 18 '19 at 11:15
  • 1
    Then the issue is potentially related to pandas leaking memory, and you should better highlight this in your post and tags. Instead you should probably leave docker out of the picture, unless you can only reproduce the issue when running it via docker. – moooeeeep Dec 18 '19 at 11:25
  • 1
    Related: https://stackoverflow.com/q/14224068/1025391 – moooeeeep Dec 18 '19 at 11:32
  • That means that the problem is definetly part of the python script, and not related to docker in any way? – cheesus Dec 18 '19 at 15:54
  • Did you try? Surely you can run this outside docker. Currently only you can reproduce this issue, so you need to find this out yourself. – moooeeeep Dec 19 '19 at 08:14
  • It did also ramp up memory usage when executed outside the container. I was able to resolve the issue by calling gc.collect() after every loop. – cheesus Dec 19 '19 at 08:20
  • Nice, I suggest you post and accept a self-answer in this case. – moooeeeep Dec 19 '19 at 08:28

1 Answers1

3

As suggested by moooeeeep, the increase of memory usage was related to a memory leak, the exact source of which remains to be identified. However, I was able to resolve the issue by manually calling the garbage collector after every loop, via gc.collect().

enter image description here

cheesus
  • 1,111
  • 1
  • 16
  • 44
  • 1
    Probably it's the same issue as described [here](https://stackoverflow.com/q/14224068/1025391) and [here](https://github.com/pandas-dev/pandas/issues/2659) (pandas leaking memory somehow). – moooeeeep Dec 19 '19 at 09:02