The very short answer is that you cannot.
malloc
is fundamentally a C++ standard library function which the CUDA toolchain internally overloads with a device hook in device code. Attempting to define your own device version of malloc
or free
can and will break the toolchain's internals. Exactly how depends on platform and compiler.
In your previous question on this, you had code like this:
__device__ void* malloc(size_t)
{ return theHeap.alloc(t); }
__device__ void free(void* p)
{ the Heap.dealloc(p); }
Because of existing standard library requirements, malloc
and free
must be defined as __device__ __host__
at global namespace scope. It is illegal in CUDA to have separate __device__
and __host__
definitions of the same function. You could probably get around this restriction by using a private namespace for the custom allocator, or using different function names. But don't try and redefine anything from the standard library in device or host code. It will break things.