1

I'm trying to make a simple Python/CUDA interface using CFFI. However, I'm consistently getting an "invalid argument" error during the data retrieval stage in the cudaMemcpyDeviceToHost. Can anyone spot the issue? I'd really appreciate any help anyone can offer.

The code I'm wanting to wrap is just called "array.cu":

// array.cu
#include "array.h"
using namespace std;

void allocate(
    float* host_array,
    float* device_array,
    int length
) {
    cout << "Allocating h_ptr (" << host_array << ") ";
    cout << "on device using d_ptr (" << device_array << ") ";
    cout << "of length=" << length << endl;
    CUCHK(cudaMalloc((void**) &device_array, length*sizeof(float)));
    CUCHK(cudaMemcpy(device_array,  host_array, length*sizeof(float), cudaMemcpyHostToDevice));
}

void retrieve(
    float* device_array,
    float* host_array,
    int length
) {
    cout << "Retrieving h_ptr (" << host_array << ") ";
    cout << "from device using d_ptr (" << device_array << ") ";
    cout << "of length=" << length << endl;
    CUCHK(cudaMemcpy(host_array, device_array, length*sizeof(float), cudaMemcpyDeviceToHost));
}

And I've written a wrapper script in the form of a library I call "cupid":

# cupid.py
import numpy as np
from cffi import FFI

ffi = FFI()
lib = ffi.dlopen("./cupid/src/libAlg.so")

class cupid:

    def __init__(self, numpy_array):
        self._numpy_array  = numpy_array
        self._host_array   = ffi.cast("float *", np.ascontiguousarray(numpy_array, np.float32).ctypes.data)
        self._device_array = ffi.new("float *")
        self._length       = numpy_array.size
        self._shape        = numpy_array.shape
        self._dtype        = numpy_array.dtype
        self.allocate()
        return

    def allocate(self):
        ffi.cdef(
        """
        void allocate(
        float *host_array,
        float *device_array,
        int length
        );
        """)
        lib.allocate(self._host_array, self._device_array, self._length)
        return

    def retrieve(self):
        ffi.cdef(
        """
        void retrieve(
        float* device_array,
        float* host_array,
        int length
        );
        """)
        lib.retrieve(self._device_array, self._host_array, self._length)

        self._numpy_array = np.frombuffer(ffi.buffer(self._host_array, 
        self._length*ffi.sizeof("float")), dtype=np.float32)
        self._numpy_array = np.ctypeslib.as_array(self._numpy_array, shape=tuple(self._shape))
        self._numpy_array = self._numpy_array.reshape(tuple(self._shape))
        return self._numpy_array

which is being called as:

# test.py
import numpy as np
from cupid import cupid
from pprint import pprint

numpy_array = np.zeros((5,6), dtype=np.float32)
cupid_array = cupid(numpy_array)
numpy_array = cupid_array.retrieve()

Producing the output:

Allocating h_ptr (0x559552b81fa0) on device using d_ptr (0x559552b80fe0) of length=30
Retrieving h_ptr (0x559552b81fa0) from device using d_ptr (0x559552b80fe0) of length=30
Cuda error in file 'array.cu' in line 25 : invalid argument. (<- the cudaMemcpyDeviceToHost)

Looking at the memory address, I see the memory being handled is the same, and it's not being deleted in between stages - it just seems that that device array loses its CUDA credentials and isn't recognised as a device array anymore.

1 Answers1

1

if you look into cffi documentation

x = ffi.new("int *")      # allocate one int, and return a pointer to it

the returned object is a pointer to some data allocated somewhere, now when you pass this pointer to a function.

void allocate(
float *device_array
)

this pointer is passed by value, (copied to the stack of the new function), so when you do

cudaMalloc((void**) &device_array ...

you are essentially getting an address to something on the stack, to which the address of the newly created cuda array is written to ... and lost.

in order to fix this problem you have to allocate a void ** in cffi (or float **).

# creates void* and stores a pointer to it in self._device_array
self._device_array = ffi.new("void **")

your function has to be modified to accept this now.

void allocate(
    float* host_array,
    void** device_array,
    int length
) {
    cout << "Allocating h_ptr (" << host_array << ") ";
    cout << "of length=" << length << endl;
    CUCHK(cudaMalloc(device_array, length*sizeof(float)));
    cout << "on device using d_ptr (" << *device_array << ") ";
    CUCHK(cudaMemcpy(*device_array,  host_array, length*sizeof(float), cudaMemcpyHostToDevice));
}

your other functions also have to also accept this void ** pointer and dereference it whenever you want the address to the device_array.

Ahmed AEK
  • 8,584
  • 2
  • 7
  • 23
  • Ahmed thank you so much for your help, and I'm sorry it took so long to respond. This fixed it, thanks for pointing out my mistake. – TheGitPuller Apr 12 '23 at 16:52