1

Is this a memory leak in the third party library I'm using, or is there something about Python garbage collection and memory management that I do not understand?

At the end, I would assume the memory usage to be close to what it was in the beginning (33MB), because I don't have any references to the objects that were created inside do_griddly_work(). However, the memory usage is way higher (1600MB) and does not drop after exiting the function or collecting garbage.

This is printed

Before any work:  33.6953125 MB
After griddly work:  1601.60546875 MB
0
After garbage collect:  1601.60546875 MB

by the following code

from griddly import GymWrapperFactory, gd, GymWrapper
import gc
import os, psutil

def do_griddly_work():
    current_path = os.path.dirname(os.path.realpath(__file__))
    env = GymWrapper(current_path + '/griddly_descriptions/testbed1.yaml',
                     player_observer_type=gd.ObserverType.VECTOR,
                     global_observer_type=gd.ObserverType.SPRITE_2D,
                     level=0)
    env.reset()
    for _ in range(10000):
        c_env = env.clone()
    # Print memory usage after work
    print('After griddly work: ', psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2, 'MB')


if __name__ == '__main__':
    process = psutil.Process(os.getpid())

    # Memory usage before work = ~33MB
    print('Before any work: ', process.memory_info().rss / 1024 ** 2, 'MB')
    
    # Do work that clones an environment a lot
    do_griddly_work()
    
    # Collect garbage
    print(gc.collect())
    
    # Memory usage after work = ~1600 MB
    print('After garbage collect: ', process.memory_info().rss / 1024 ** 2, 'MB')
Antti_M
  • 904
  • 10
  • 20
  • Garbage collection in Python is not like that in most languages. It uses a reference counting scheme, so an object is deleted immediately when there are no more references to it. The only way to leak is to keep around a reference you forgot about, in a list or dictionary for example. – Mark Ransom Dec 03 '21 at 19:26

2 Answers2

1

Most languages, including Python, are under no obligation to release memory back to the OS once an object is destroyed. In fact because OS allocations are generally made in blocks that are much larger than a single object, that block will contain multiple objects and if any of those are still live it will be impossible to return it to the OS.

memory_info().rss is reporting the used memory from the OS point of view, not the Python runtime's.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • I got an answer from the library developer, and the problem is being fixed by him. – Antti_M Dec 07 '21 at 13:20
  • @Antti_M that's fantastic! Please come back and leave an answer once it's fixed. But do keep in mind what I said here, I think it will still be relevant. – Mark Ransom Dec 07 '21 at 16:53
  • @Antti_M also see [How do I profile memory usage in Python?](https://stackoverflow.com/q/552744/5987) for other methods of seeing memory usage. – Mark Ransom Dec 07 '21 at 17:00
1

The problem was solved by the author of the Griddly library that I am using. The cloned environments weren't reachable by Python garbage collection and there was a memory leak in the underlying C++ implementation.

Antti_M
  • 904
  • 10
  • 20