Computation between two different kernels in Cuda

Question

Here are two kernel functions that I wrote - the complete code has no compiling error, but has warnings.

This program is about biology sequence alignment. In kernel one, matrix s0 is calculated by using gpu_sample and gpu_data, in kernel two, gpu_s0 is the same as the gpu_s0 in kernel one. gpu_s is computed by gpu_s0 and itself. Here comes the code and warning:

__global__ void myKernel1( char** gpu_sample, char** gpu_data, float **gpu_s0)   
{
    dim3 dimGrid1;
    dim3 dimBlock1;
    dimBlock1.x = dimBlock1.y = BLOCK_SIZE;
    dimGrid1.x = dimGrid1.y = GRID_SIZE;

    int i1 = threadIdx.x + blockIdx.x * dimBlock1.x;
    int j1 = threadIdx.y + blockIdx.y * dimBlock1.y;

    if( i1 > N || j1 > M ) return;
    while ( j1 < (M+1) && i1 < (N+1) )
    {
        if(gpu_sample[0][j1] == gpu_data[i1][0])    // here is the warning part.
        {

            gpu_s0[i1+1][j1+1] = 5;
        }
        else 

            gpu_s0[i1+1][j1+1] = -3;

    i1 += blockDim.x * gridDim.x;
    j1 += blockDim.y * gridDim.y;
    }
}

__global__ void myKernel2( float **gpu_s0, float **gpu_s )
{
    dim3 dimGrid2;
    dim3 dimBlock2;
    dimBlock2.x = dimBlock2.y = BLOCK_SIZE;
    dimGrid2.x = dimGrid2.y = GRID_SIZE;

    float w = -4;                                   
    float zero = 0;


    __shared__ float shared[ threadsPerBlock ][threadsPerBlock]; 
    int i2 = threadIdx.x + blockIdx.x * dimBlock2.x;
    int j2 = threadIdx.y + blockIdx.y * dimBlock2.y;

    while( j2 < (M+1) && i2 < (N+1) )
    {
      shared[threadIdx.x][threadIdx.y] = gpu_s0[i2][j2];      // here is the warning.

      i2 += blockDim.x * gridDim.x;
      j2 += blockDim.y * gridDim.y;
    }

    __syncthreads();


    if( j2 < (M+1) && i2 < (N+1) )
    gpu_s[i2][0] = gpu_s[0][j2] = 0;


    /*if ( j2 < (M+1) && i2 < (N+1) )
    sTemp0[threadIdx.x][threadIdx.y] = gpu_s0[i2][j2]; //????????
    __syncthreads();*/


    if( i2 > N || j2 > M ) return;
    while ( j2 < (M+1) && i2 < (N+1) )
    {

        gpu_s[i2][j2] = max(gpu_s[i2-1][(j2-1)] + shared[threadIdx.x][threadIdx.y], //?????????
                          gpu_s[i2][(j2-1)] + w, 
                          gpu_s[(i2-1)][j2] + w, 
                          zero);     // here is the warning.
        i2 += blockDim.x * gridDim.x;
        j2 += blockDim.y * gridDim.y;
    }

}

Warning:

./test_10_15_2012.cu(155): Warning: Cannot tell what pointer points to, assuming global memory space
./test_10_15_2012.cu(155): Warning: Cannot tell what pointer points to, assuming global memory space
./test_10_15_2012.cu(155): Warning: Cannot tell what pointer points to, assuming global memory space
./test_10_15_2012.cu(186): Warning: Cannot tell what pointer points to, assuming global memory space
./test_10_15_2012.cu(208): Warning: Cannot tell what pointer points to, assuming global memory space
./test_10_15_2012.cu(208): Warning: Cannot tell what pointer points to, assuming global memory space
./test_10_15_2012.cu(208): Warning: Cannot tell what pointer points to, assuming global memory space
./test_10_15_2012.cu(208): Warning: Cannot tell what pointer points to, assuming global memory space

Can anybody help me fix this problem, or give me some suggestions?

You're getting that warning because you're dereferencing a pointer to a pointer (all of your kernel's parameters are pointers-to-pointer). The compiler can't prove that the pointer doesn't point to `__shared__`, so it assumes (correctly) that it points to global. To eliminate the warnings, transform your code to avoid using pointers-to-pointer or compile with the command line option `-arch=sm_20`. SM 20 (Fermi) or later GPUs don't have this problem. — Jared Hoberock, Nov 19 '12 at 23:24
How can I do that without using pointers-to-pointer. I use two dimensional arrays for storing sample matrix, data matrix and s, s0. could you give me some suggestions to avoid using pointers-to-point? It would be deeply appreciated if you could give me some examples or do some modification to my code. Thank you so much. — user1837194, Nov 20 '12 at 01:50
In C it's not difficult to [handle a 2D array as 1D](http://stackoverflow.com/questions/2151084/map-a-2d-array-onto-a-1d-array-c). You can then pass a pointer instead of a pointer to a pointer. If you really want to pass multidimensional arrays with a single pointer in C, you can [do that too](http://stackoverflow.com/questions/12924155/sending-3d-array-to-cuda-kernel). Or if your GPU is CC 2.0 or better, just use the compiler switch Jared indicated. And by the way, it's just a warning. If you're satisfied with the code, it can be ignored. — Robert Crovella, Nov 20 '12 at 02:33

Computation between two different kernels in Cuda

0 Answers0