2

I have a C++ library which performs analysis on audio data, and a C API to it. One of the C API functions takes const int16_t* pointers to the data and returns the results of the analysis.

I'm trying to build a Python interface to this API, and most of it is working, but I'm having trouble getting ctypes pointers to use as arguments for this function. Since the pointers on the C side are to const, it feels to me like it ought to be possible to make this work fine with any contiguous data. However, the following does not work:

import ctypes
import wave

_native_lib = ctypes.cdll.LoadLibrary('libsound.so')
_native_function = _native_lib.process_sound_data
_native_function.argtypes = [ctypes.POINTER(ctypes.c_int16),
                             ctypes.c_size_t]
_native_function.restype = ctypes.c_int

wav_path = 'hello.wav'

with wave.open(wav_path, mode='rb') as wav_file:
    wav_bytes = wav_file.readframes(wav_file.getnframes())

data_start = ctypes.POINTER(ctypes.c_int16).from_buffer(wav_bytes) # ERROR: data is immutable
_native_function(data_start, len(wav_bytes)//2)

Manually copying wav_bytes to a bytearray allows the pointer to be constructed but causes the native code to segfault, indicating that the address it receives is wrong (it passes unit tests with data read in from C++). Fixing this by getting the address right would technically solve the problem but I feel like there's a better way.

Surely it's possible to just get the address of some data and promise that it's the right format and won't be altered? I'd prefer not to have to deep copy all my Pythonically-stored audio data to a ctypes format, since presumably the bytes are in there somewhere if I can just get a pointer to them!

Ideally, I'd like to be able to do something like this

data_start = cast_to(address_of(data[0]), c_int16_pointer)
_native_function(data_start, len(data))

which would then work with anything that has a [0] and a len. Is there a way to do something like this in ctypes? If not, is there a technical reason why it's impossible, and is there something else I should be using instead?

  • Specify *argtypes* and *restype* for *\_native\_function*. https://stackoverflow.com/questions/52268294/python-ctypes-cdll-loadlibrary-instantiate-an-object-execute-its-method-priva/52272969#52272969 https://stackoverflow.com/questions/53182796/python-ctypes-issue-on-different-oses/53185316#53185316. – CristiFati Jul 02 '19 at 17:21
  • @CristiFati I edited in the code loading `_native_function`. My question is not how to declare this function, but rather how to convert Python data into a suitable argument for it. – Baryons for Breakfast Jul 03 '19 at 01:45
  • Your comment about the segfaulting is interesting. Did you see this in Windows? Did you try it under macOS or Linux? I feel a [question I asked](https://stackoverflow.com/q/64421004/562930) could be associated to what your saw. – Matthew Walker Oct 30 '20 at 06:10
  • @MatthewWalker Unfortunately I only tested this on Linux so I have no idea what would happen on Windows. Actually, I don't think I ever fixed that dodgy copy on Linux either since I ended up just casting it instead. The whole ctypes API is a bit of a mess to be honest. And I'm afraid I don't know how you could debug your issue short of compiling your own CPython with -g and running GDB on it. – Baryons for Breakfast Nov 05 '20 at 09:51

2 Answers2

2

This should work for you. Use array for a writable buffer and create a ctypes array that references the buffer.

data = array.array('h',wav_bytes)
addr,size = data.buffer_info()
arr = (c_short * size).from_address(addr)
_native_function(arr,size)

Alternatively, to skip the copy of wav_bytes into data array, you could lie about the pointer type in argtypes. ctypes knows how convert a byte string to a c_char_p. A pointer is just an address, so the _native_function will receive the address but use it as an int* internally:

_native_function.argtypes = c_char_p,c_size_t
_native_function(wav_bytes,len(wav_bytes) // 2)

Another way to work around the "underlying buffer is not writable" error is to leverage c_char_p, which allows an immutable byte string to used, and then explicitly cast it to the pointer type you want:

_native_function.argtypes = POINTER(c_short),c_size_t
p = cast(c_char_p(wav_bytes),POINTER(c_short))
_native_function(p,len(wav_bytes) // 2)

In these latter cases you must ensure you don't actually write to the buffer as it will corrupt the immutable Python object holding the data.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • 1
    From looking at the documentation this seems to deep copy the data into an array. I agree that it should work, but I'm looking for a way to pass this existing data into the C functions without copying all of it (or an explanation of why it's fundamentally impossible to do this). – Baryons for Breakfast Jul 03 '19 at 05:21
  • @BaryonsforBreakfast. There is a way. See update above. – Mark Tolonen Jul 03 '19 at 05:54
  • @BaryonsforBreakfast Added another option as well. – Mark Tolonen Jul 03 '19 at 16:40
  • I just had a chance to test the third method and it does seem to work (as it obviously should). It's a bit less flexible and a bit less safe than I'd like but it's probably the best that can be done as long as ctypes doesn't have `const`. Thanks! – Baryons for Breakfast Jul 04 '19 at 02:48
  • For the third example, does `c_char_p` not [assume the string is NULL terminated](https://docs.python.org/3/library/ctypes.html#ctypes.c_char_p)? So for the likes of audio data, which may contain NULLs throughout, would that not potentially create an issue? Further, it's plausible that audio data is not NULL terminated--it would depend on the audio itself. Do you still feel `c_char_p` is an option here? – Matthew Walker Oct 30 '20 at 05:49
  • @MatthewWalker No it just passes an address to C. The buffer can contain anything. – Mark Tolonen Oct 30 '20 at 05:56
  • Thanks for your reply @MarkTolonen. It worries me that the documentation says "For a general character pointer that may also point to binary data, POINTER(c_char) must be used." – Matthew Walker Oct 30 '20 at 06:00
  • @MatthewWalker `ctypes` does have special handling if you use `c_char_p` as a return type. It gets converted to a Python `bytes` string only up to a null pointer and loses access to the pointer value. If you call a C function that allocates memory and still need access to the pointer, or the returned pointer is arbitrary binary data, *then* you need `POINTER(c_char)`. It's also possible to suppress the `c_char_p` default behavior by deriving a subclass, e.g. `class LPCHAR(c_char_p): pass`. – Mark Tolonen Oct 30 '20 at 06:15
  • @MatthewWalker It talks about my above comment behavior in the the [Fundamental Data Types](https://docs.python.org/3/library/ctypes.html#ctypes-fundamental-data-types-2) paragraphs. – Mark Tolonen Oct 30 '20 at 06:19
  • @MarkTolonen, that's very clear, thanks. It's a pity the official documentation is not as specific. – Matthew Walker Oct 30 '20 at 20:38
1

I had a look around at the CPython bug tracker to see if this had come up before, and it seems it was raised as an issue in 2011. I agree with the poster that it's a serious mis-design, but it seems the developers at that time did not.

Eryk Sun's comment on that thread revealed that it's actually possible to just use ctypes.cast directly. Here is part of the comment:

cast calls ctypes._cast(obj, obj, typ). _cast is a ctypes function pointer defined as follows:

   _cast = PYFUNCTYPE(py_object, 
                      c_void_p, py_object, py_object)(_cast_addr)

Since cast makes an FFI call that converts the first arg to c_void_p, you can directly cast bytes to a pointer type:

   >>> from ctypes import *
   >>> data = b'123\x00abc'

   >>> ptr = cast(data, c_void_p)

It's a bit unclear to me if this is actually required by the standard or if it's just a CPython implementation detail, but the following works for me in CPython:

import ctypes
data = b'imagine this string is 16-bit sound data'
data_ptr = ctypes.cast(data, ctypes.POINTER(ctypes.c_int16))

The documentation on cast says the following:

ctypes.cast(obj, type)

This function is similar to the cast operator in C. It returns a new instance of type which points to the same memory block as obj. type must be a pointer type, and obj must be an object that can be interpreted as a pointer.

so it seems that that CPython is of the opinion that bytes 'can be interpreted as a pointer'. This seems fishy to me, but these modern pointer-hiding languages have a way of messing with my intuition.

  • However, at least under Python 3.8, it seems that a `memoryview` cannot be cast. If we continue on with your example above, `mv = memoryview(data); data_ptr = ctypes.cast(mv, ctypes.POINTER(ctypes.c_int16))` generates the error `ctypes.ArgumentError: argument 1: : wrong type`. Any ideas for working around this issue? – Matthew Walker Oct 30 '20 at 05:58
  • @MatthewWalker I've never used `memoryview` before but looking at the documentation it seems that even though it *sounds* like a simple `char*` pointer and a length, the data they point to doesn't actually have to be contiguous, so it's actually more like a pair of C++ iterators. I would guess that you can't cast it due to some combination of (a) the data may not be contiguous [though they could check and raise an exception], and (b) no one has bothered to write a conversion for it. – Baryons for Breakfast Nov 05 '20 at 09:40