I'm writing a Python extension module in C++ with Boost Python. I want to return numpy arrays from the module to Python. It works well with numeric datatypes like double
but at one point I need to create a string
array from existing data.
For numeric arrays I used PyArray_SimpleNewFromData
which worked well, but since strings are not fixed length I used PyArray_New
where I can pass in the itemsize which is in my case 4. Here's a minimal example:
bool initNumpy()
{
Py_Initialize();
import_array();
return true;
}
class Foo {
public:
Foo() {
initNumpy();
data.reserve(10);
data = {"Rx", "Rx", "Rx", "RxTx", "Tx", "Tx", "Tx", "RxTx", "Rx", "Tx"};
}
PyObject* getArray() {
npy_intp dims[] = { data.size() };
return (PyObject*)PyArray_New(&PyArray_Type, 1, dims, NPY_STRING, NULL, &data[0], 4, NPY_ARRAY_OWNDATA, NULL);
}
private:
std::vector<std::string> data;
};
I expect the output of getArray()
to be equal to the output of numpy.array(["Rx", "Rx" ...], dtype="S4")
which is:
array([b'Rx', b'Rx', b'Rx', b'RxTx', b'Tx', b'Tx', b'Tx', b'RxTx', b'Rx',
b'Tx'], dtype='|S4')
but it looks like this:
array([b'Rx', b'', b'\xcc\xb3b\xd9', b'\xfe\x07', b'\x02', b'', b'\x0f',
b'', b'Rx\x00\x03', b''], dtype='|S4')
I tried playing around with the npy_intp const* strides
argument because I think the issue are the memory blocks of the underlying data. Unfortunately it didnt achieve my desired results.
I'm using Microsoft Build Tools 2017, Boost python, distutils and Python 3.7 to build the extension.