-1

I followed the following example: multi-GPU basic usage

The Code is doing summation:

The man who is answered checked as accepted, sends add function like this:

for(int i=0;i<10000;++i) {
    for(int dev=0; dev<2; dev++) {
        cudaSetDevice(dev);
        add<<<NB,NT>>>( dev_a[dev], dev_b[dev], dev_c[dev], Ns[dev] );
    }
}

In above code he added NS[dev] also to add function but, man who sends the question posts add function like this:

__global__ void add( double *a, double *b, double *c){

    int tid = threadIdx.x + blockIdx.x * blockDim.x; 

    while(tid < N){
        c[tid] = a[tid] + b[tid];
        tid += blockDim.x * gridDim.x;
    }

}

What is the function of Ns[dev] in above function. Because when i remove Ns[dev] in below code, like this:

add<<<NB,NT>>>( dev_a[dev], dev_b[dev], dev_c[dev]);

Add function does not work. I mean it does not adds the values.

How can i use Ns[dev] in add function here?

Community
  • 1
  • 1
ehah
  • 675
  • 1
  • 7
  • 11

1 Answers1

1

In the linked answer, Ns is an array specifying the amount of data which should be processed by each device. dev is the id of the current device.

You should add an argument to the kernel which should specify the length of the data being processed in the kernel.

__global__ void add( double *a, double *b, double *c, const int N)
{
    int tid = threadIdx.x + blockIdx.x * blockDim.x; 

    while(tid < N){
        c[tid] = a[tid] + b[tid];
        tid += blockDim.x * gridDim.x;
    }
}
sgarizvi
  • 16,623
  • 9
  • 64
  • 98