2

Building upon this answer from my previous questions, I'd like to make arrays of memoryviews.

Problem 1

Build a 2D array of memoryviews with fixed lenghts, e.g.

mv1 = memoryview(b'1234')
mv2 = memoryview(b'abcd')
cdef const unsigned char[:,:] tmv = (mv1, mv2)

With this I get:

TypeError: a bytes-like object is required, not 'tuple'

I tried using C arrays of pointers:

ctypedef const unsigned char[:] k_t
cdef unsigned char* mva[2]
mv1 = memoryview(b'1234')
mv2 = memoryview(b'abcd')
cdef k_t mvk1 = mv1
cdef k_t mvk2 = mv2
mva = (&mvk1, &mvk2)

But that didn't work either:

Cannot take address of memoryview slice

Problem 2

Build an arbitrarily long 3D array, basically a list of the above 2D array objects. This other answer to a similar question and the Cython docs on allocating memory got me a bit closer (I believe I should be using malloc and pointers, I would not want to introduce C++ if not necessary) but I am still stuck with problem #1. Any suggestions are welcome!


Edit (problem #1): Even throwing a Cython array in the mix gives me the same error:

from cython cimport view
mv1 = memoryview(b'1234')
mv2 = memoryview(b'abcd')
cvarr = view.array(shape=(2,1), itemsize=sizeof(char), format='B')
cvarr = (mv1, mv2)
print(cvarr[0][1])
# So far so good... this prints `50` as expected.
cdef const unsigned char[:,:] cvw = cvarr
# Adding this last line throws `a bytes-like object is required, not 'tuple'`

Now I'm really confused. Why would the tuple be good for the Cython array but not for the memview?

user3758232
  • 758
  • 5
  • 19
  • 1
    Sorry in advance if this is nonsense, but maybe something like `cdef unsigned char[::view.indirect_contiguous,:] tmv = (mv1, mv2)` could make sense for you? It at least cythonizes without error. – Paul Panzer May 20 '18 at 22:46
  • @PaulPanzer It compiles, but throws an error atr runtime: `a bytes-like object is required, not 'tuple'` – user3758232 May 20 '18 at 22:59
  • Ok, I'll throw in one more clueless comment and then I'll shut up. Might [this](https://stackoverflow.com/q/10465091/7207392) be similar enough to be useful? – Paul Panzer May 20 '18 at 23:09
  • Are all your memoryviews the same length? And are you prepared to accept data being copied rather than viewed? If the answer to either question is "no" then this isn't going to work. – DavidW May 21 '18 at 06:20
  • @DavidW All my elements have the same length but no, I want to avoid copying them. Which approach is not going to work? Any of them? – user3758232 May 21 '18 at 11:51
  • 2
    The 2D memoryview that you're trying to make is a view on a single region of memory (interpreted as a 2d array). Since you are trying to combine multiple separate regions of memory, it definitely won't work. I can't think of any built-in Cython structure that would be suitable. It's possible that you could make something from the indirect_contiguous that Paul Panzer suggested, but it'd need a lot of wrapper code and I'm not sure I know how to do it. – DavidW May 21 '18 at 12:10
  • The memoryviews in my real program are only 3 x 5 bytes for each set, so I may as well copy them--it's just that there could be zero to millions of those sets in my top level (arbitrarily long) array. – user3758232 May 21 '18 at 12:51
  • 1
    @user3758232 Are your memoyviews all just 5-long sequences taken from a single huge array? If so, you might be better just storing an array of offsets instead of memoryviews – DavidW May 21 '18 at 16:44
  • @DavidW No, they are 5-byte memoryviews from database (LMDB) lookups. They are not contiguous. I have the option to get straight bytes from LMDB if I need to copy the data anyways. Which leads to another big question I have: are Cython memoryviews pointers, or a beast of its own kind? – user3758232 May 21 '18 at 17:11
  • Scratch my previous comment. They *may* be contiguous on some predictable lookup methods. By array of offsets you mean an array of pointers? – user3758232 May 21 '18 at 17:16
  • 1
    1) By offsets I meant "number of elements from the start of the array". You could also use pointers though. 2) Cython memoryviews have pointers in their implementation but are more complicated - for example, they can keep track of shape (for 2D arrays), and handle Python reference counting. 3) It sounds like the data structure you want doesn't exist directly in Cython (and may not be easy to build). I'd suggest either copying the data, or putting memoryviews in a Python list (may not be as slow as you think) – DavidW May 21 '18 at 18:26

2 Answers2

1

Note: Not even close to a complete solution (at least at the moment!)

I agree with @DavidW that it is likely better if a single contiguous cython typed memoryview owns all of the data and data is copied into it from your python memoryviews. This is true especially if you plan to only create the giant cython typed memoryview once, but plan on iterating over it many times.

However, you could get a pointer to the contents of your python memoryview by using the PyMemoryView_GET_BUFFER to get the underlying buffer belonging to that memoryview. Then, you could either memcpy the data into a larger data structure (for faster copying) or just keep track of an array of pointers, with each element pointing to the data of a memoryview (which is slower during iteration since you would be jumping around memory from memoryview buffer pointer to memoryview buffer pointer).

Here is a way to get the pointer to the underlying data of a python memoryview object. From the cython github's cpython folder, there is no mention of PyMemoryView, so I had to wrap it manually:

from cpython.object cimport PyObject

cdef extern from "Python.h":
     Py_buffer* PyMemoryView_GET_BUFFER(PyObject *mview)

cdef object mv1 = memoryview(b'1234')
cdef Py_buffer* buf = PyMemoryView_GET_BUFFER(<PyObject*>mv1)
cdef char* buf_ptr = <char*>buf.buf
print(buf_ptr)#prints b'1234'

Update 1:

Wasn't 100% sure what the 3D array structure was supposed to look like, so I am just taking the 2D case. Since you said you did not want to introduce C++, I created this array_t data type that behaves like a vector (well, a pointer to a bunch of void*). Lots of ugly boilerplate, but here it goes:

from cpython.object cimport PyObject
from libc.stdlib cimport malloc, calloc, realloc, free
from libc.string cimport memcpy, memmove

cdef extern from "Python.h":
    Py_buffer* PyMemoryView_GET_BUFFER(PyObject *mview)

cdef char* get_view_ptr(object view):
    cdef Py_buffer* py_buf = PyMemoryView_GET_BUFFER(<PyObject*>view)
    cdef char* ptr = <char*>py_buf.buf
    return ptr

ctypedef struct array_t:
    void** data
    int max_items
    int num_items

cdef void array_init(array_t* array):
    array.data = NULL
    array.max_items = 0
    array.num_items = 0

cdef void array_add(array_t* array, void* item):
    if array.max_items == 0:
        array.max_items = 10
        array.num_items = 0
        array.data = <void**>calloc(array.max_items, sizeof(void*))
    if array.max_items == array.num_items:
        array.max_items *= 2
        array.data = <void**>realloc(array.data, array.max_items * sizeof(void*))
    array.data[array.num_items] = item
    array.num_items += 1

cdef void array_set(array_t* array, int index, void *item):
    if index < 0 or index >= array.max_items:
        return
    array.data[index] = item

cdef void* array_get(array_t* array, int index):
    if index < 0 or index >= array.max_items:
        return NULL
    return array.data[index]

cdef void array_remove(array_t* array, int index):
    cdef:
        void* src
        void* dest
    if index < 0 or index >= array.max_items:
        return
    array.data[index] = NULL
    if index+1 != array.max_items:
        src = &array.data[index+1]
        dest = &array.data[index]
        memmove(dest, src, (array.max_items - index) * sizeof(void*))
    array.num_items -= 1

cdef void array_free(array_t* array):
    free(array.data)

cdef int i
cdef array_t a
cdef object mv1 = memoryview(b'12345')
cdef object mv2 = memoryview(b'67890')
cdef object mv3 = memoryview(b'abcde')
cdef object mv4 = memoryview(b'!@#$%')

array_init(&a)
array_add(&a, get_view_ptr(mv1))
array_add(&a, get_view_ptr(mv2))
array_add(&a, get_view_ptr(mv3))
array_add(&a, get_view_ptr(mv4))

for i in range(a.num_items):
    print(i, <char*>array_get(&a, i))
CodeSurgeon
  • 2,435
  • 2
  • 15
  • 36
0

This seems to resolve problem #1:

mv1 = memoryview(b'1234')
mv2 = memoryview(b'abcd')
cdef unsigned char mva[2][4]
mva  = (mv1, mv2)
cdef const unsigned char[:,:] cvw = mva

However it issues two warnings on line 4 about

Obtaining 'unsigned char [4]' from externally modifiable global Python value

I think I can ignore those warnings because I am actually using cvw which is a constant.

user3758232
  • 758
  • 5
  • 19