The Goal: Get a Memoryview from a 2D C++ char array using Cython.
A little background:
I have a native C++ library which generates some data and returns it via a char**
to the Cython world. The array is initialized and operated in the library about like this:
struct Result_buffer{
char** data_pointer;
int length = 0;
Result_buffer( int row_capacity) {
data_pointer; = new char*[row_capacity];
return arr;
}
// the actual data is appended row by row
void append_row(char* row_data) {
data_pointer[length] = row_data;
length++;
}
}
So we basically get an array of nested sub-arrays.
Side Notes:
- each row has the same count of columns
- rows can share memory, i.e. point to the same row_data
The goal is to use this array with a memoryview preferrably without expensive memory copying.
First Approach (not working):
Using Cython arrays and memoryviews:
Here's the .pyx-file which should consume the generated data
from cython cimport view
cimport numpy as np
import numpy as np
[...]
def raw_data_to_numpy(self):
# Dimensions of the source array
cdef int ROWS = self._row_count
cdef int COLS = self._col_count
# This is the array from the C++ library and is created by 'create_buffer()'
cdef char** raw_data_pointer = self._raw_data
# It only works with a pointer to the first nested array
cdef char* pointer_to_0 = raw_data_pointer[0]
# Now create a 2D Cython array
cdef view.array cy_array = <char[:ROWS, :COLS]> pointer_to_0
# With this we can finally create our NumPy array:
return np.asarray(cy_array)
This is actually compiles fine and runs without crashing, but the result isn't quite what I expected. If I print out the values of the NumPy array I get this:
000: [1, 2, 3, 4, 5, 6, 7, 8, 9]
001: [1, 0, 0, 0, 0, 0, 0, 113, 6]
002: [32, 32, 32, 32, 96, 96, 91, 91, 97]
[...]
it turns out that the first row was mapped correctly, but the other rows look rather like uninitialized memory. So there's probably a mismatch with the memory-layout of char**
and the default mode of 2D memoryviews.
Edit #1: What I've learned from my other question is that the built-in cython arrays don't support indirect memory layouts so I have to create a cython-wrapper for the unsigned char**
which exposes the buffer-protocol