2

Using pybind11 C++ API and python3, how can we properly create a numpy array of objects (i.e. unicode strings) in the C++ implementation and return it back to python? What is the exact memory layout of the underlying data array passed into pybind11::array()? How exactly do we need to manage memory, i.e. delete/free?

Note that this is necessary because we want to use that array of strings, in conjunction with other POD arrays, in pandas DataFrame creation.

joe-ts
  • 473
  • 1
  • 4
  • 8
  • 2
    I can't help with `pybind11`. But `numpy` arrays can be created with a function the `numpy C API`. Array layouts are all basically the same, attributes plus a data buffer. For an object type the data buffer contains pointers to objects elsewhere in memory. WIth a `unicode` dtype that buffer actually contains the strings (padded to a specified length). You may have to study the `numpy` docs. – hpaulj Jul 16 '17 at 04:00
  • thanks - it seems like the best way is to manually create the buffer with objects and then figure out how pybind11 can manage memory associated with the underlying array as well as all the objects... – joe-ts Jul 18 '17 at 01:53

1 Answers1

3

Turns out that it is necessary to:

  1. create an array of PyObject pointers, fill the array, i.e.

    auto* pbuf = new PyObject*[arraySize]; // or create via pybind11 API...
    pbuf[0] = <new object...>
    pbuf[1] = <new object...>
    etc.
    
  2. create an "object" py::array() with a capsule:

    py::capsule freeWhenDone(pbuf, [](void* pp) {
            delete [] (PyObject*)pp; // or else properly free the pbuf memory
        });
    
    arr = py::array(py::dtype("object"),
        shape, strides, pbuf, freeWhenDone);
    
joe-ts
  • 473
  • 1
  • 4
  • 8
  • Do you also know a solution which does not need the `freeWhenDone`? I suppose if the memory is malloc'd by pybind11, py::array can use the normal free'ing. – olq_plo Nov 22 '19 at 08:37
  • The way I was able to get it to work without leaking is with a capsule. Moreover, when inner objects are python strings, I had to free each individually allocated object. There may be other solutions with the newer version of pybind11. – joe-ts Nov 24 '19 at 17:41