I'm making a C++ cuda program, in which (as far as I know):
cudaMallocManaged(&data, size);
and
cudaFree(&data);
is analogous to malloc/new
and delete
.
Say I have a class
struct vec{
int size;
int* data;
vec(int size){
cudaMallocManaged(&data, x * y * sizeof(int));
}
};
and I overload the + operator:
vec operator+(vec b){
vec c;
for (i = 0; i < size; i++)
{
c.data[i] = a.data[i] + b.data[i];
}
return c;
}
mat<T> operator+(int b){
vec c;
for (i = 0; i < size; i++)
{
c.data[i] = a.data[i] + b;
}
return c;
}
now say I have some function:
vec f(vec x){
return x + x + 1;
}
Since this function creates 2 vectors, If this function is called repeatedly all the gpu memory is used up. The obvious solution is adding a deconstructor to the vec class which frees the memory when the object goes out of scope:
~vec(){
cudaFree(data);
}
The problem that arises is that since a pointer is being deleted, and the returned vector stores its data in the same pointer, the returned data is also deleted and the program crashes with a 0xc0000006 error when the data is next referenced.
One thing I considered is something like this:
void add(vec a, vec b, vec out){
vec c;
for (i = 0; i < size; i++)
{
c.data[i] = a.data[i] + b.data[i];
}
copy(c to out); //I'm not 100% sure if memcpy works in cuda
}
However, that is not only inconvenient but also doesn't allow for operator overloading. Thanks in Advance for your help.