CUDA: Copying device data to 2D host array

Question

I have a HostMatrix which was declared as:

float **HostMatrix

I have to copy the content of device matrix , pointed to by devicePointer to the 2 dimensional host matrix HostMatrix

I tried this

for (int i=0; i<numberOfRows; i++){
    cudaMemcpy(HostMatrix[i], devicePointer, numberOfColumns *sizeof(float),
                 cudaMemcpyDeviceToHost);
    devicePointer += numberOfColumns;// so as to reach next row
}

But this will be wrong since I am doing this inside a host function, and devicePointer can not be manipulated directly in host function as I am doing in last line.

So what will be the correct way to achieve this ?

Edit

Oh actually this will work correctly!. But the problem would come while de-allocating the memory as discussed in my earlier question: CUDA: Invalid Device Pointer error when reallocating memory . So basically the following will be incorrect

 for (int i=0; i<numberOfRows; i++){
        cudaMemcpy(HostMatrix[i], devicePointer, numberOfColumns *sizeof(float),
                     cudaMemcpyDeviceToHost);
        devicePointer += numberOfColumns;// so as to reach next row
    }
   cudaFree(devicePointer); //invalid device pointer

Have you tried ? Seems to be good for me. devicePointer points to device memory, but it's still a variable on the host and you **can** do devicePointer += numberOfColumns. — Shawn, Oct 03 '16 at 16:45
What do you mean by "`devicePointer` can not be manipulated directly in host function"? Pointers aren't magical unicorns with secret and mystical properties. They are unsigned integers with enough bits to hold the value of an address in memory. Nothing more than that. Of course you can "manipulate it" in host code. All that you cannot do is de-reference it, because its value isn't a valid address in the host memory space. — talonmies, Oct 03 '16 at 16:52
@talonmies Sorry I got confused. Now it s clear. I just want to know how do I free the allocated memory now. If I use cudaFree it will give error. — user3891236, Oct 03 '16 at 16:55

score 2 · Accepted Answer · answered Oct 03 '16 at 17:13

2

You basically need to first allocate devicePointer with all the required memory. But then, increasing it all the time is maybe not the easiest idea, since then the free at the end will be broken. Say you have nRows rows of size nCols. Then this should work properly (I didn't try though, but the idea should be ok):

float* dPtr;
cudaMalloc(&dPtr, nRows * nCols);
for (int i=0; i< nRows; i++){
    cudaMemcpy(HostMatrix[i], dPtr + i * nCols, nCols * sizeof(float), cudaMemcpyDeviceToHost);
}
// do whatever you want
cudaFree(dPtr);

The issue is that if you keep increasing dPtr, the cudaFree at the end will only be on the "last row" so it's wrong.

Does it make sense?

answered Oct 03 '16 at 17:13

Shawn

593
4
12

Yes it worked without errors in memory allocation/de-allocation. But why cudaFree worked even though it did not get base address as argument (as we incremented the pointer). it is on last row as you said. – user3891236 Oct 03 '16 at 17:45
In my case I was incrementing the pointer outside cuda API cudamemcpy (see my edits), in your case it is inside cudamemcpy. Why your cudaFree worked and mine did not? Do you want to say that changes (increments) to the pointer will be not visible to the cudaFree in your case? if so why ? – user3891236 Oct 03 '16 at 17:56
@user3891236: I think you need to spend some time learning the basics of C++ programming before going any further with CUDA. Nowhere in this answer is the value of `ptr` changed between the `cudaMalloc` and `cudaFree` calls. If you don't understand why taht is. I doubt anyone here can help you. – talonmies Oct 03 '16 at 18:50
The issue is that you always need to call free (cudaFree or just the usual free on host code) on the pointer you got from cudaMalloc (or just malloc). The thing is that the free needs to know exactly what you want to free (you always need to free the whole array at once), and it can only do that using the original pointer from malloc. If you give him an incremented version of that pointer, it can't figure out that it also has to free what's before. – Shawn Oct 03 '16 at 19:08

CUDA: Copying device data to 2D host array

1 Answers1