Consider a C buffer of N elements created with:
from ctypes import byref, c_double
N = 3
buffer = (c_double * N)()
# C++ function that fills the buffer byref
pull_function(byref(buffer))
# load buffer in numpy
data = np.frombuffer(buffer, dtype=c_double)
Works great. But my issue is that the dtype
may be numerical (float, double, int8, ...) or string.
from ctypes import byref, c_char_p
N = 3
buffer = (c_char_p * N)()
# C++ function that fills the buffer byref
pull_function(byref(buffer))
# Load in.. a list?
data = [v.decode("utf-8") for v in buffer]
How can I load those UTF-8 encoded string directly in a numpy array? np.char.decode
seems to be a good candidate, but I can't figure out how to use it. np.char.decode(np.frombuffer(buffer, dtype=np.bytes_))
is failing with ValueError: itemsize cannot be zero in type
.
EDIT: The buffer can be filled from the Python API. The corresponding lines are:
x = [list of strings]
x = [v.encode("utf-8") for v in x]
buffer = (c_char_p * N)(*x)
push_function(byref(buffer))
Note that this is a different buffer from the one above. push_function
pushes the data in x
on the network while pull_function
retrieves the data from the network. Both are part of the LabStreamingLayer C++ library.
Edit 2: I suspect I can get this to work if I can reload the 'push' buffer into a numpy array before sending it to the network. The 'pull' buffer is probably the same. In that sense, here is a MWE demonstrating the ValueError
described above.
from ctypes import c_char_p
import numpy as np
x = ["1", "23"]
x = [elt.encode("utf-8") for elt in x]
buffer = (c_char_p * 2)(*x)
np.frombuffer(buffer, dtype=np.bytes_) # fails
[elt.decode("utf-8") for elt in buffer] # works