2

In the following code I am simply calling a function foo twice serially from main. The function simply does device memory allocation , and then increments this pointer. Then it exits and goes back to main.

First time foo is called memory is correctly allocated. But now as you can see in output when I call foo again, cuda memory allocation is failing with an error invalid device pointer

I tried using cudaThreadSynchronize() between two foo calls, but no gain. Why memory allocation failing ?

Actually the error is casued due to

matrixd += 3;

Because if I don't do this increment the error disappeared.
But why , even though I am using cudaFree() ?

Kindly help me understand this.

My Output is here

Calling foo for the first time
Allocation of matrixd passed:
I came back to main safely :-)
I am going back to foo again :-)
Allocation of matrixd failed, the reason is:  invalid device pointer

My main() is here

#include<stdio.h>  
#include <cstdlib> // malloc(), free() 
#include <iostream> // cout, stream
#include <math.h>
#include <ctime> // time(), clock()
#include <bitset>
bool foo(  );

/***************************************
Main method.

****************************************/
 int main()  
 { 

    // Perform one warm-up pass and validate
    std::cout << "Calling foo for the first time"<<std::endl;
    foo();
    std::cout << "I came back to main safely :-) "<<std::endl;
    std::cout << "I am going back to foo again :-) "<<std::endl;
    foo( );    
    getchar();  
    return 0;  
 }  

Definition of foo() is in this file :

#include <cuda.h>
#include <cuda_runtime_api.h>
#include <device_launch_parameters.h>
#include <iostream>

bool foo( )
{
    // Error return value
    cudaError_t status;
    // Number of bytes in the matrix.
    int bytes = 9 *sizeof(float);
        // Pointers to the device arrays
    float *matrixd=NULL; 

    // Allocate memory on the device to store matrix
    cudaMalloc((void**) &matrixd, bytes);
    status = cudaGetLastError();              //To check the error
    if (status != cudaSuccess) {                     
        std::cout << "Allocation of matrixd failed, the reason is:  " <<    cudaGetErrorString(status) << 
        std::endl;
        cudaFree(matrixd);                     //Free call for memory
        return false;
    }

    std::cout << "Allocation of matrixd passed: "<<std::endl;


    ////// Increment address 
    for (int i=0; i<3; i++){
         matrixd += 3;
    }

        // Free device memory
    cudaFree(matrixd);     

    return true;
}

Update

With better error checking. Also I am incrementalism the device pointer only once. This time I get following output:

Calling foo for the first time
Allocation of matrixd passed:
Increamented the pointer and going to free cuda memory:
GPUassert: invalid device pointer C:/Users/user/Desktop/Gauss/Gauss/GaussianElem
inationGPU.cu 44

Line number 44 is cudaFree(). Why it still failing?

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}

// GPU function for direct method Gross Jorden method.

bool foo( )
{

    // Error return value
    cudaError_t status;
    // Number of bytes in the matrix.
    int bytes = 9 *sizeof(float);
        // Pointers to the device arrays
    float *matrixd=NULL; 

    // Allocate memory on the device to store each matrix
    gpuErrchk( cudaMalloc((void**) &matrixd, bytes));
    //cudaMemset(outputMatrixd, 0, bytes);

    std::cout << "Allocation of matrixd passed: "<<std::endl;


    ////// Incerament address 

         matrixd += 1;

         std::cout << "Increamented the pointer and going to free cuda memory: "<<std::endl;

         // Free device memory
    gpuErrchk( cudaFree(matrixd));     

    return true;
}
talonmies
  • 70,661
  • 34
  • 192
  • 269
user3891236
  • 607
  • 1
  • 9
  • 23
  • 1
    What happens if you check the return status of the `cudaFree` call`? – talonmies Oct 03 '16 at 05:43
  • @talonmies you are right , just checked, I used cudagetlasterror(), below cudafree and yes it shows , it is failing But again why ? – user3891236 Oct 03 '16 at 05:47
  • Right. So your question is basically being caused by incomplete error checking. You can see how to do it correctly [here](http://stackoverflow.com/q/14038589/681865). The memory allocation isn't failing. – talonmies Oct 03 '16 at 05:49
  • I will check your answer in the link, but are you sure there is no error in deallocating (cuadgetlasterror reported an error) ? – user3891236 Oct 03 '16 at 05:53

1 Answers1

2

The real problem is in this code:

for (int i=0; i<3; i++){
     matrixd += 3;
}

// Free device memory
cudaFree(matrixd);   

You never allocated matrixd+9, so passing it to cudaFree is illegal and produces an invalid device pointer error. This error is being propagated to the next time you perform error checking, which is after the subsequent call to cudaMalloc. If you read the documentation for any of these API calls you will note that there is a warning that they can return errors from prior GPU operations. This is what is happening in this case.

Error checking in the CUDA runtime API can be subtle to do correctly. There is a robust, ready recipe for how to do it here. I suggest you use it.

Community
  • 1
  • 1
talonmies
  • 70,661
  • 34
  • 192
  • 269
  • your way of error checking is very neat. Please see my update. I think my mistake is that I am trying to increment device pointer inside a host function. I guess this is not allowed, and the cuda free is not happy with this. In fact matrixd++ in host function will point to some garbage in host not in device memory.. – user3891236 Oct 03 '16 at 16:21
  • 1
    @user3891236: I told you exactly what the problem is. You can't free an address you didn't allocate. "Incrementing" the pointer is perfectly OK (although completely pointless in this case). But asking the API to free the incremented pointer is illegal because the API never allocated memory at that pointer value. – talonmies Oct 03 '16 at 16:36
  • Thank you very much for clearing my doubts. I learnt many things from you today including how important is checking errors in CUDA!. – user3891236 Oct 03 '16 at 16:44