2

Consider the following program:

import pandas as pd
import datetime
import time
import psutil
import os
import gc

# Construct a trivial pandas time series
data = []
indexes = []
for _ in xrange(5):
  data.append(_)
  indexes.append(datetime.datetime.now())
  time.sleep(1)
s = pd.Series(data, index=indexes)

for _ in xrange(100000):
  # Remove the next line to prevent memory leak
  foo = datetime.datetime.now() - s.index[-1] 

  # These lines are okay
  foo_dt = datetime.datetime.now()
  foo_idx = s.index[-1]
  #gc.collect()  # This mitigates but does not eliminate the problem

  # Get memory per https://stackoverflow.com/a/21632554/939259
  process = psutil.Process(os.getpid())
  print(process.memory_info().rss)

This gives the result (if the gc.collect() is included):

$ python ./test_leak.py | uniq
60502016
60547072
60755968
<snip>

Without the gc.collect() is similar:

$ python ./test_leak.py | uniq
60518400
60588032
60776448
<snip>

What's going on here? Why is memory increasing when all I'm doing is assigning a temporary?

Thomas Johnson
  • 10,776
  • 18
  • 60
  • 98

0 Answers0