8

The following minimal example of calling a python function from C++ has a memory leak on my system:

script.py:

import tensorflow
def foo(param):
    return "something"

main.cpp:

#include "python3.5/Python.h"

#include <iostream>
#include <string>

int main()
{
    Py_Initialize();

    PyRun_SimpleString("import sys");
    PyRun_SimpleString("if not hasattr(sys,'argv'): sys.argv = ['']");
    PyRun_SimpleString("sys.path.append('./')");

    PyObject* moduleName = PyUnicode_FromString("script");
    PyObject* pModule = PyImport_Import(moduleName);
    PyObject* fooFunc = PyObject_GetAttrString(pModule, "foo");
    PyObject* param = PyUnicode_FromString("dummy");
    PyObject* args = PyTuple_Pack(1, param);
    PyObject* result = PyObject_CallObject(fooFunc, args);

    Py_CLEAR(result);
    Py_CLEAR(args);
    Py_CLEAR(param);
    Py_CLEAR(fooFunc);
    Py_CLEAR(pModule);
    Py_CLEAR(moduleName);

    Py_Finalize();
}

compiled with

g++ -std=c++11 main.cpp $(python3-config --cflags) $(python3-config --ldflags) -o main

and run with valgrind

valgrind --leak-check=yes ./main

produces the following summary

LEAK SUMMARY:
==24155==    definitely lost: 161,840 bytes in 103 blocks
==24155==    indirectly lost: 33 bytes in 2 blocks
==24155==      possibly lost: 184,791 bytes in 132 blocks
==24155==    still reachable: 14,067,324 bytes in 130,118 blocks
==24155==                       of which reachable via heuristic:
==24155==                         stdstring          : 2,273,096 bytes in 43,865 blocks
==24155==         suppressed: 0 bytes in 0 blocks

I'm using Linux Mint 18.2 Sonya, g++ 5.4.0, Python 3.5.2 and TensorFlow 1.4.1.

Removing import tensorflow makes the leak disappear. Is this a bug in TensorFlow or did I do something wrong? (I expect the latter to be true.)


Additionally when I create a Keras layer in Python

#script.py
from keras.layers import Input
def foo(param):
    a = Input(shape=(32,))
    return "str"

and run the call to Python from C++ repeatedly

//main.cpp

#include "python3.5/Python.h"

#include <iostream>
#include <string>

int main()
{
    Py_Initialize();

    PyRun_SimpleString("import sys");
    PyRun_SimpleString("if not hasattr(sys,'argv'): sys.argv = ['']");
    PyRun_SimpleString("sys.path.append('./')");

    PyObject* moduleName = PyUnicode_FromString("script");
    PyObject* pModule = PyImport_Import(moduleName);

    for (int i = 0; i < 10000000; ++i)
    {
        std::cout << i << std::endl;
        PyObject* fooFunc = PyObject_GetAttrString(pModule, "foo");
        PyObject* param = PyUnicode_FromString("dummy");
        PyObject* args = PyTuple_Pack(1, param);
        PyObject* result = PyObject_CallObject(fooFunc, args);

        Py_CLEAR(result);
        Py_CLEAR(args);
        Py_CLEAR(param);
        Py_CLEAR(fooFunc);
    }

    Py_CLEAR(pModule);
    Py_CLEAR(moduleName);

    Py_Finalize();
}

the memory consumption of the application continuously grows ad infinitum during runtime.

So I guess there is something fundamentally wrong with the way I call the python function from C++, but what is it?

Tobias Hermann
  • 9,936
  • 6
  • 61
  • 134
  • You could use —trace-origin=yes to see where the leaked memory is allocated. There are often globals in python modules which are initialized when loaded and which stay for ever - but it isn’t a big issue, because every module is loaded only once. – ead Jan 11 '18 at 21:28
  • Your second memory leak is stranger, maybe there is a leak in keras? Do you see a memory leak if keras isn’t imported or if Input isn’t created? – ead Jan 11 '18 at 21:30
  • @ead The growing memleak in the loop only occurs if I create `Input`. Importing Keras only just results in the memleak of the first example. – Tobias Hermann Jan 12 '18 at 10:18
  • @ead Memory leak traces always show the origin. --trace-origin is used for uninitialized memory. To quote the manual "To see information on the sources of uninitialised data in your program, use the --track-origins=yes option. This makes Memcheck run more slowly, but can make it much easier to track down the root causes of uninitialised value errors." from http://valgrind.org/docs/manual/mc-manual.html – Paul Floyd Jan 16 '18 at 09:56

1 Answers1

6

There are two different types "memory leaks" in your question.

Valgrind is telling you about the first type of memory leaks. However, it is pretty usual for python modules to "leak" memory - it is mostly some globals which are allocated/initialized when the module is loaded. And because the module is loaded only once in Python its not a big problem.

A well known example is numpy's PyArray_API: It must be initialized via _import_array, is then never deleted and stays in memory until the python interpreter is shut down.

So it is a "memory leak" per design, you can argue whether it is a good design or not, but at the end of the day there is nothing you could do about it.

I don't have enough insight into the tensorflow-module to pin-point the places where such memory leaks happen, but I'm pretty sure that it's nothing you should worry about.


The second "memory leak" is more subtle.

You can get a lead, when you compare the valgrind output for 10^4 and 10^5 iterations of the loop - there will be almost no difference! There is however difference in the peak-memory consumption.

Differently from C++, Python has a garbage collector - so you cannot know when exactly an object is destructed. CPython uses reference counting, so when a reference count gets 0, the object is destroyed. However, when there is a cycle of references (e.g. object A holds a reference of object B and object B holds a reference of object B) it is not so simple: the garbage collector needs to iterate through all objects to find such no longer used cycles.

One could think, that keras.layers.Input has such a cycle somewhere (and this is true), but this is not the reason for this "memory leak", which can be observed also for pure python.

We use objgraph-package to inspect the references, let's run the following python script:

#pure.py
from keras.layers import Input
import gc
import sys
import objgraph


def foo(param):
    a = Input(shape=(1280,))
    return "str"

###  MAIN :

print("Counts at the beginning:")
objgraph.show_most_common_types()
objgraph.show_growth(limit=7) 

for i in range(int(sys.argv[1])):
   foo(" ")

gc.collect()# just to be sure

print("\n\n\n Counts at the end")
objgraph.show_most_common_types()
objgraph.show_growth(limit=7)

import random
objgraph.show_chain(
   objgraph.find_backref_chain(
        random.choice(objgraph.by_type('Tensor')), #take some random tensor
         objgraph.is_proper_module),
    filename='chain.png') 

and run it:

>>> python pure.py 1000

We can see the following: at the end there are exactly 1000 Tersors, that means none of our created objects got disposed!

If we take a look at the chain, which keeps a tensor-object alive (was created with objgraph.show_chain), so we see:

enter image description here

that there is a tensorflow-Graph-object where all tensors are registered and stay there until session is closed.

So far the theory, however neighter:

#close session and free resources:
import keras
keras.backend.get_session().close()#free all resources

print("\n\n\n Counts after session.close():")
objgraph.show_most_common_types()

nor the here proposed solution:

with tf.Graph().as_default(), tf.Session() as sess:
   for step in range(int(sys.argv[1])):
     foo(" ")

has worked for the current tensorflow-version. Which is probably a bug.


In a nutshell: You do nothing wrong in your c++-code, there are no memory leaks you are responsible for. In fact you would see exactly same memory consumption if you would call the function foo from a pure python-script over and over again.

All created Tensors are registered in a Graph-object and aren't automatically released, you must release them by closing the backend session - which however doesn't work due to a bug in the current tensorflow-version 1.4.0.

ead
  • 32,758
  • 6
  • 90
  • 153
  • Awesome analysis! Thank you so much. – Tobias Hermann Jan 16 '18 at 14:12
  • @TobiasHermann It's a little bit embarrassing, but my analysis was pretty wrong. But now I hope I got it right... – ead Jan 16 '18 at 20:48
  • Thanks again. I just tested your approach, but was unable to remove the growing leak with it. The code now is this: [main.cpp](http://codepad.org/zHvo7Vtj), [script.cpp](http://codepad.org/NiXt4HdZ). Do you have an idea what's wrong? – Tobias Hermann Jan 17 '18 at 07:23
  • @TobiasHermann I think in your script you needs `import keras` otherwise `clear_session` throws, but you don't handle it in your cpp-code properly. I would first check, that your cpp-program works as pure python script. – ead Jan 17 '18 at 08:05
  • Oh, sorry. I should have checked that. OK, now I have [this `script.py`](http://codepad.org/G3FkIxxM). Running it simply by using `python3 script.py` still makes the memory consumption grow. Any ideas? – Tobias Hermann Jan 17 '18 at 08:29
  • I don't have access to my PC, but will take a look later. Can you use `graphobj `(`pip install graphobj`) in your script to see which objects are alive and why (chain)? – ead Jan 17 '18 at 08:36
  • Sure. [This script](http://codepad.org/eJSDrWbj) produces [that output](http://codepad.org/1fnwo4zd) and [those images](https://imgur.com/a/LOid9). – Tobias Hermann Jan 17 '18 at 08:47
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/163308/discussion-between-ead-and-tobias-hermann). – ead Jan 17 '18 at 09:06