I want to reduce a memory copy step during my data processing pipeline.
I want to do the following:
Generate some data from a custom C library
Feed the generated data into a MXNet model running on GPU.
For now, my pipeline does the following:
Create a C-contiguous numpy array via
np.empty(...)
.Get the pointer to numpy array via
np.ndarray.__array_interface__
Call the C library from python (via ctypes) to fill the numpy array.
Convert the numpy array into mxnet
NDArray
, this will copy the underlying memory buffer.Pack
NDArray
s into amx.io.DataBatch
instance, then feed into model.
Please note, before being fed into model, all arrays stay in CPU memory.
I noticed a mx.io.DataBatch
can only take a list of mx.ndarray.NDArray
s as data
and label
parameter, but not numpy arrays. It works until you feed it into a model. On the other hand, I have a C library that can write directly to a C-contiguous array.
I would like to avoid the memory copying in step 3. One possible way is somehow getting a raw pointer to buffer of NDArray
, while totally ignoring numpy. But whatever works.