how to suppress the memory explosion when importing C funcs into python code using ctypes?

Question

I wrote a python code embedded with C code by using ctypes.
the C code is being called multiple times in a for loop.
the C code is as follows:
test.h

#include<Python.h>

PyObject *getFeature(wchar_t *text);
// where the unigram is a Set Object with type 'PySetObject'

test.c

#include<test.h>

PyObject *getFeature(wchar_t *text)
{
    int ret = -1;
    PyObject *featureList = PyList_New(0);

    PyObject *curString = PyUnicode_FromWideChar(text, 2);
    ret = PyList_Append(featureList, curString);
    Py_DECREF(curString);
    return featureList;
}

and then I compiled it and get a shared lib called libtest.so. So I can import this C .so file into the python code with ctypes like below:

test.py

import ctypes

dir_path = 'path/to/the/libtest.so'
feature_extractor = ctypes.PyDLL(
    os.path.join(dir_path, 'libtest.so'))
get_feature_c = feature_extractor.getFeature
get_feature_c.argtypes = [
    ctypes.c_wchar_p, ctypes.py_object]
get_feature_c.restype = ctypes.py_object

def get_feature(text):
    return [text[:2]]

times = 100000
for i in range(times):
    res = get_feature_c('ncd')  # the memory size will become larger and larger.

for i in range(times):
    res = get_feature('ncd')  # the memory will remain in a fixed size.

and I moniter the memory cost of the program with command top and find that the memory explodes in comply with the for loop times.
but when I write a python func, the memory remains in a steady size.
I assume that after every call of the C func, the memory is not released correctly. So how to release and control the memory after each calling?
BTW: I only ask this question in a simple way, and the whole C func code is in C code. and there is no memory leak in the C code.

This is probably relevant: [**How can I explicitly free memory in Python?**](https://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-memory-in-python) — Andrew Henle, Feb 14 '22 at 09:47
The C code leaks (missing `Py_DECREF` in many places), but it isn't looped. — Mark Tolonen, Feb 14 '22 at 12:21

Mark Tolonen · Accepted Answer · 2022-02-14T12:31:10.650

The code in your example doesn't leak:

#include<test.h>

PyObject *getFeature(wchar_t *text)
{
    int ret = -1;
    PyObject *featureList = PyList_New(0);

    // Create new reference to "curString" (allcates memory)
    PyObject *curString = PyUnicode_FromWideChar(text, 2);

    // Add "curString" to "featureList", incrementing reference count
    ret = PyList_Append(featureList, curString);

    // "curString" no longer used, reduce reference count.
    Py_DECREF(curString);

    // Correctly returns a single reference to the list,
    // which contains a single reference to a string
    return featureList;
}

When res is re-assigned the return value of get_feature_c, the previous value of res (a list) has its reference count reduced. If that count is zero (it is) then the references of each item in the list is decremented as well, and the objects are freed if their reference goes to zero, then the list object is freed as well.

But in your referenced C code, There are many leaks due to not calling Py_DECREF. When you leak a reference, an object's reference count never reaches zero and never freed, creating a memory leak:

// Create a new object with "PyUnicode_FromWideChar",
// Add another reference via "featureList",
// so leaked reference to the object.
ret = PyList_Append(featureList, PyUnicode_FromWideChar(charCurrentFeature, 2));

Also here:

PyObject *bigrams1 = PySet_New(0);
// each "PyUnicode_FromWideChar" leaks a reference.
ret = PySet_Add(unigrams1, PyUnicode_FromWideChar(L"据", 1));
ret = PySet_Add(unigrams1, PyUnicode_FromWideChar(L"nc", 2));
ret = PySet_Add(unigrams1, PyUnicode_FromWideChar(L"ckd", 3));
ret = PySet_Add(unigrams1, PyUnicode_FromWideChar(L"nc.3e", 5));

You can test if your code leaks references with a debug build of your test DLL and a debug build of Python. I'll demonstrate with a Windows build:

test.c - debug build compiled with Micrsoft Visual Studio
cl /LD /MDd /W3 /Ic:\python310\include test.c -link /libpath:c:\python310\libs

#ifdef _WIN32
#   define API __declspec(dllexport)
#else
#   define API
#endif

#include <Python.h>

API PyObject *getFeature(wchar_t *text)
{
    int ret = -1;
    PyObject *featureList = PyList_New(0);

    PyObject *curString = PyUnicode_FromWideChar(text, 2);  // allocates curString (1st reference)
    ret = PyList_Append(featureList, curString);  // Creates 2nd reference to curString in featureList
    Py_DECREF(curString); // curString no longer used
    return featureList;
}

test.py

import ctypes as ct
import sys

feature_extractor = ct.PyDLL('./test')
get_feature_c = feature_extractor.getFeature
get_feature_c.argtypes = ct.c_wchar_p, # OP example code had error here
get_feature_c.restype = ct.py_object

def get_feature(text):
    return [text[:2]]

times = 10
for i in range(times):
    print(sys.gettotalrefcount()) # Only available in debug build of Python
    res = get_feature_c('ncd')

Output when run with debug build of Python to enable sys.gettotalrefcount(), and note that total reference count doesn't grow over loops:

C:\>python_d test.py
70904
70910
70910
70910
70910
70910
70910
70910
70910
70910

Now with Py_DECREF commented out a reference is leaked every loop:

Thank you so much!(●'◡'●)~~ My program now runs in a fixed memory size. PyObject must be defined, used and ref-count decreased seperately. And the tricky problem is that I use `valgrind` to execute the executable program and show no memory leak. whatever it doesnt matter. — dongrixinyu, Feb 15 '22 at 07:35

how to suppress the memory explosion when importing C funcs into python code using ctypes?

1 Answers1