I am interested in passing binary data between python, numpy, and cython using the buffer protocol. Looking at PEP 3118, there appear to be some additions to the struct string-syntax that add support for useful features such as named fields and nested structs.
However, it appears that support for the full range of buffer syntax is limited in all of those three places. For example, say I have the following cython struct:
ctypedef packed struct ImageComp:
uint32_t width
uint32_t height
uint8_t *pixels
#Here is the appropriate struct format string representation
IMAGE_FORMAT = b'T{L:width:L:height:&B:pixels:}'
Attempting to extract the PEP-3118 compliant bytes string as follows
cdef void *image_temp = malloc(sizeof(ImageComp))
IMAGE_SIZE = sizeof(ImageComp)
IMAGE_FORMAT = (<ImageComp[:1]>image_temp)._format
IMAGE_DTYPE = np.asarray(<ImageComp[:1]>image_temp).dtype
free(image_temp)
Fails with this error message:
Invalid base type for memoryview slice: ImageComp
since typed memoryviews cannot be created if they contain pointers.
Similarly, creating a view.array
using my custom string or using the python struct
module's calcsize
function will give a warning like struct.error: bad char in struct format
.
I can manually create and fill a Py_buffer
object as described here, but attempting to convert this to a numpy array with np.asarray
yields ValueError: 'T{L:width:L:height:&B:pixels:}' is not a valid PEP 3118 buffer format string
.
With all of this in mind, I have the following questions:
- Is there any module in the standard python library that takes advantage of the complete
PEP 3118
specification? - Is this struct format syntax defined formally anywhere (i.e. with a PEG grammar)?
- Is there a way to force cython or numpy to automatically generate a valid format string if it contains pointers?