Pybind11 and std::vector -- How to free data using capsules?

Question

I have a C++ function that returns a std::vector and, using Pybind11, I would like to return the contents of that vector as a Numpy array without having to copy the underlying data of the vector into a raw data array.

Current Attempt

In this well-written SO answer the author demonstrates how to ensure that a raw data array created in C++ is appropriately freed when the Numpy array has zero reference count. I tried to write a version of this using std::vector instead:

// aside - I made a templated version of the wrapper with which
// I create specific instances of in the PYBIND11_MODULE definitions:
//
//     m.def("my_func", &wrapper<int>, ...)
//     m.def("my_func", &wrapper<float>, ...)
// 
template <typename T>
py::array_t<T> wrapper(py::array_t<T> input) {
    auto proxy = input.template unchecked<1>();
    std::vector<T> result = compute_something_returns_vector(proxy);

    // give memory cleanup responsibility to the Numpy array
    py::capsule free_when_done(result.data(), [](void *f) {
        auto foo = reinterpret_cast<T  *>(f);
        delete[] foo;
    });

    return py::array_t<T>({result.size()}, // shape
                          {sizeof(T)},     // stride
                          result.data(),   // data pointer
                          free_when_done);
}

Observed Issues

However, if I call this from Python I observe two things: (1) the data in the output array is garbage and (2) when I manually delete the Numpy array I receive the following error (SIGABRT):

python3(91198,0x7fff9f2c73c0) malloc: *** error for object 0x7f8816561550: pointer being freed was not allocated

My guess is that this issue has to do with the line "delete[] foo", which presumably is being called with foo set to result.data(). This is not the way to deallocate a std::vector.

Possible Solutions

One possible solution is to create a T *ptr = new T[result.size()] and copy the contents of result to this raw data array. However, I have cases where the results might be large and I want to avoid taking all of that time to allocate and copy. (But perhaps it's not as long as I think it would be.)

Also, I don't know much about std::allocator but perhaps there is a way to allocate the raw data array needed by the output vector outside the compute_something_returns_vector() function call and then discard the std::vector afterwards, retaining the underlying raw data array?

The final option is to rewrite compute_something_returns_vector.

score 11 · Answer 1 · answered Feb 26 '19 at 17:40

After an offline discussion with a colleague I resolved my problem. I do not want to commit an SO faux pas so I won't accept my own answer. However, for the sake of using SO as a catalog of information I want to provide the answer here for others.

The problem was simple: result was stack-allocated and needed to be heap-allocated so that free_when_done can take ownership. Below is an example fix:

{
    // ... snip ...

    std::vector<T> *result = new std::vector<T>(compute_something_returns_vector(proxy));

    py::capsule free_when_done(result, [](void *f) {
      auto foo = reinterpret_cast<std::vector<T> *>(f);
      delete foo;
    });

    return py::array_t<T>({result->size()}, // shape
                          {sizeof(T)},      // stride
                          result->data(),   // data pointer
                          free_when_done);
}

I was also able to implement a solution using std::unique_ptr that doesn't require the use of a free_when_done function. However, I wasn't able to run Valgrind with either solution so I'm not 100% sure that the memory held by the vector was appropriately freed. (Valgrind + Python is a mystery to me.) For completeness, below is the std::unique_ptr approach:

{
    // ... snip ...

    std::unique_ptr<std::vector<T>> result =
        std::make_unique<std::vector<T>>(compute_something_returns_vector(proxy));

    return py::array_t<T>({result->size()}, // shape
                          {sizeof(T)},      // stride
                          result->data());  // data pointer
}

I was, however, able to inspect the addresses of the vectors allocated in both the Python and C++ code and confirmed that no copies of the output of compute_something_returns_vector() were made.

No worries, accepting own answers is fine. An example similar to the first one of yours is also here: https://github.com/pybind/pybind11/issues/1042#issuecomment-325941022 The second one (with unique_ptr) doesn't look correct to me, the data will be deallocated when you return from this function. — marcin, Feb 27 '19 at 11:17
Thanks for the GitHub reference. Regarding data deallocation, the Numpy array seemed to retain the data delivered by the C++ code when using `unique_ptr`. But as I mentioned, it's difficult to tell if it's copied or if there is a memory leak. — Chris Swierczewski, Feb 27 '19 at 17:54
Note: I'd probably avoid the second example with `unique_ptr`. There appears to be no transfer of ownership of `result`, so the memory might be explicitly freed too early when said variable goes out of scope. Same reasoning works if using `shared_ptr` not `unique_ptr` - where is the copy of the smart pointer saved on the Python side? If memory is freed to early, this could imply data corruption if some other heap allocation happens to use that same memory range. I think the first answer looks entirely correct and solid, that's what I am using. Ref: https://stackoverflow.com/a/26737405/107409 — Contango, Jun 21 '20 at 14:59
@marcin You are correct. First example excellent, second bad as memory deallocated too early. — Contango, Jun 21 '20 at 15:23

Pybind11 and std::vector -- How to free data using capsules?

1 Answers1