0

I'm writing a Python extension module in C++ with PyObject and arrayobject. My question is based on "How to create fixed-width ndarray of strings", which provided a solution to create a fixed-width nparray of strings such as list = {"Rx", "Rx", "Rx", "RxTx", "Tx", "Tx", "Tx", "RxTx", "Rx", "Tx"}. However, I come across a situation that the widths of my strings are random and not equal, like this:

list = {"DataDate", "ukey", "OrderRef", "ticktime", "sign", "side", "orderType", "orderSize", "limitPrice", "Status"}

The list is a vector of strings: std::vector<std::string>. If I detected the longest item and used the solution of"How to create fixed-width ndarray of strings":

static PyObject* string_vector_to_nparray(const std::vector<std::string>& vec, size_t itemsize)
{
    if( !vec.empty() )
    {
        size_t mem_size = vec.size()*itemsize;
        void * mem = PyDataMem_NEW(mem_size);
        size_t cur_index=0;
        for(const auto& val : vec){
            for(size_t i=0;i<itemsize;i++){
                char ch = i < val.size() ? val[i] : 0; // fill with NUL if string too short
                reinterpret_cast<char*>(mem)[cur_index] = ch;
                cur_index++;
            }
        }
        npy_intp dims = static_cast<npy_intp>(vec.size());         
        PyObject* PyArray = PyArray_New(&PyArray_Type, 1, &dims, NPY_STRING, NULL, mem, 4, NPY_ARRAY_OWNDATA, NULL);   
        return PyArray;     
    } 
    else 
    {
        npy_intp dims[1] = {0};
        return (PyObject*) PyArray_ZEROS(1, dims, PyArray_FLOAT, 0);
    }
}

std::vector<std::string> col_list;
col_list.push_back("...");
col_list.push_back("...");
...
auto it = std::max_element(std::begin(col_lists), std::end(col_lists),
    [](std::string& lhs, std::string& rhs){return lhs.size() < rhs.size();});
auto num = it->size(); // here is your max size
std::cout << "Longest: [" << *it << "] of size: " << num<<std::endl;

size_t itemsize = num;
PyObject *PyArray  =  string_vector_to_nparray(col_lists, itemsize);

return PyArray;

the exported array would be like:

np.array([b'Data', b'Date', b'\x00\x00uk', b'ey', b'', b'Orde', b'rRef', b'\x00\x00ti', b'ckti', b'me'], dtype='|S4')

in Python. How to create a non-fixed-width nparray of strings from an existing string vector?

Cyan
  • 319
  • 2
  • 8
  • You'll have to find the length of the longest string, and create the NumPy array with elements of that size, then copy the items from the C++ vector to the NumPy array. In the answer that you linked to, this corresponds to computing `itemsize` as the length of the longest string in the vector. – Warren Weckesser Apr 21 '22 at 02:56
  • @WarrenWeckesser It gave `array([b'Data', b'Date', b'\x00\x00uk', b'ey', b'', b'Orde', b'rRef',b'\x00\x00ti', b'ckti', b'me'], dtype='|S4')` – Cyan Apr 21 '22 at 16:27
  • @WarrenWeckesser I updated my question with more codes – Cyan Apr 21 '22 at 16:32

0 Answers0