8

I have a python memoryview pointing to a bytes object on which I would like to perform some processing in cython.

My problem is:

  • because the bytes object is not writable, cython does not allow constructing a typed (cython) memoryview from it
  • I cannot use pointers either because I cannot get a pointer to the memoryview start

Example:

In python:

array = memoryview(b'abcdef')[3:]

In cython:

  • cdef char * my_ptr = &array[0] fails to compile with the message: Cannot take address of Python variable
  • cdef char[:] my_view = array fails at runtime with the message: BufferError: memoryview: underlying buffer is not writable

How does one solve this?

ARF
  • 7,420
  • 8
  • 45
  • 72
  • First question: how do you declare the `array` argument in you Cython function? – Pierre de Buyl Jan 20 '17 at 14:17
  • @PierredeBuyl I pass it in as a python object. Like so: Cython: `def myfunc(arr): pass` – ARF Jan 20 '17 at 16:14
  • Hi, after some doc-looking and googling, if all that you receive is a memoryview it seems hard to obtain read-write access. You should mention how the memoryview is created in the firstplace. If you can get a `Py_buffer` struct instead this might help. https://docs.python.org/3.5/c-api/buffer.html – Pierre de Buyl Jan 20 '17 at 17:37
  • @PierredeBuyl Many thanks for the `Py_buffer` struct hint! I came to the same solution. See the my answer to my own question below... – ARF Jan 20 '17 at 18:59

4 Answers4

5

Ok, after digging through the python api I found a solution to get a pointer to the bytes object's buffer in a memoryview (here called bytes_view = memoryview(bytes())). Maybe this helps somebody else:

from cpython.buffer cimport PyObject_GetBuffer, PyBuffer_Release, PyBUF_ANY_CONTIGUOUS, PyBUF_SIMPLE


cdef Py_buffer buffer
cdef char * my_ptr

PyObject_GetBuffer(bytes, &buffer, PyBUF_SIMPLE | PyBUF_ANY_CONTIGUOUS)
try:
    my_ptr = <char *>buffer.buf
    # use my_ptr
finally:
    PyBuffer_Release(&buffer)
ARF
  • 7,420
  • 8
  • 45
  • 72
  • 1
    exactly what i am looking for .. but what is buffer_view in your code ? do you mean "buffer" ? – Udai F.mHd Apr 21 '17 at 19:23
  • 1
    I just tried this, and yes `buffer_view` should be `buffer`. – Conrad Parker Oct 23 '17 at 05:05
  • @ConradParker Thanks for the note. I copied the snippet from my code and in simplifying it, I messed with the variable names to make it clearer. I obviously missed one... Is the code snippet working for you as it is now? – ARF Oct 23 '17 at 09:41
  • @UdaiF.mHd Sorry I missed your message. I fixed the code as you suggested. – ARF Oct 23 '17 at 09:42
3

Using a bytearray (as per @CheeseLover's answer) is probably the right way of doing things. My advice would be to work entirely in bytearrays thereby avoiding temporary conversions. However:

char* can be directly created from a Python string (or bytes) - see the end of the linked section:

cdef char * my_ptr = array
# you can then convert to a memoryview as normal in Cython
cdef char[:] mview = <char[:len(array)]>my_ptr

A couple of warnings:

  1. Remember that bytes is not mutable and if you attempt to modify that memoryview is likely to cause issues
  2. my_ptr (and thus mview) are only valid so long as array is valid, so be sure to keep a reference to array for as long as you need access ti the data,
DavidW
  • 29,336
  • 6
  • 55
  • 86
  • Thanks. My problem is, I am given a `bytes` object. Thus I cannot switch to `bytesarray` without incurring the cost of instantiation. `char *` can be created from `bytes()` but not from `memoryview(bytes())`. Your suggestion fails with `TypeError: expected bytes, memoryview found` – ARF Jan 20 '17 at 17:28
  • Ah sorry - I misunderstood slightly. You can get the `bytes` back from the memoryview (no copying involved) with `array.obj` (Python >=3.3 only though) and then cast that to a `char*` – DavidW Jan 20 '17 at 21:45
  • I tried that as well: it looses the offset of the memoryview. For `test = memoryview(b'abcdef')[3:]` `bytes(test) == b'def'` while `test.obj == b'abcdef'`. The only solution I found so far is the very verbose procedure in my answer below. - Though I would love to get rid of that horrible code-clutter. – ARF Jan 21 '17 at 09:58
2

You can use bytearray to create a mutable memoryview. Please note that this won't change the string, only the bytearray

data = bytearray('python')
view = memoryview(data)
view[0] = 'c'
print data
# cython
Cheese Lover
  • 460
  • 1
  • 5
  • 14
  • Indeed. That's what I am doing right now. But instantiating a temporary `bytearray` object defeats the whole purpose of using cython to speed up my algorithm. – ARF Jan 20 '17 at 16:17
1

If you don't want cython memoryview to fail with 'underlying buffer is not writable' you simply should not ask for a writable buffer. Once you're in C domain you can summarily deal with that writability. So this works:

cdef const unsigned char[:] my_view = array
cdef char* my_ptr = <char*>&my_view[0]
panda-34
  • 4,089
  • 20
  • 25