I'm writing a Python extension module in C++ with PyObject
and arrayobject
. My question is based on "How to create fixed-width ndarray of strings", which provided a solution to create a fixed-width nparray of strings such as list = {"Rx", "Rx", "Rx", "RxTx", "Tx", "Tx", "Tx", "RxTx", "Rx", "Tx"}
. However, I come across a situation that the widths of my strings are random and not equal, like this:
list = {"DataDate", "ukey", "OrderRef", "ticktime", "sign", "side", "orderType", "orderSize", "limitPrice", "Status"}
The list is a vector of strings: std::vector<std::string>
. If I detected the longest item and used the solution of"How to create fixed-width ndarray of strings":
static PyObject* string_vector_to_nparray(const std::vector<std::string>& vec, size_t itemsize)
{
if( !vec.empty() )
{
size_t mem_size = vec.size()*itemsize;
void * mem = PyDataMem_NEW(mem_size);
size_t cur_index=0;
for(const auto& val : vec){
for(size_t i=0;i<itemsize;i++){
char ch = i < val.size() ? val[i] : 0; // fill with NUL if string too short
reinterpret_cast<char*>(mem)[cur_index] = ch;
cur_index++;
}
}
npy_intp dims = static_cast<npy_intp>(vec.size());
PyObject* PyArray = PyArray_New(&PyArray_Type, 1, &dims, NPY_STRING, NULL, mem, 4, NPY_ARRAY_OWNDATA, NULL);
return PyArray;
}
else
{
npy_intp dims[1] = {0};
return (PyObject*) PyArray_ZEROS(1, dims, PyArray_FLOAT, 0);
}
}
std::vector<std::string> col_list;
col_list.push_back("...");
col_list.push_back("...");
...
auto it = std::max_element(std::begin(col_lists), std::end(col_lists),
[](std::string& lhs, std::string& rhs){return lhs.size() < rhs.size();});
auto num = it->size(); // here is your max size
std::cout << "Longest: [" << *it << "] of size: " << num<<std::endl;
size_t itemsize = num;
PyObject *PyArray = string_vector_to_nparray(col_lists, itemsize);
return PyArray;
the exported array would be like:
np.array([b'Data', b'Date', b'\x00\x00uk', b'ey', b'', b'Orde', b'rRef', b'\x00\x00ti', b'ckti', b'me'], dtype='|S4')
in Python. How to create a non-fixed-width nparray of strings from an existing string vector?