0

I wrote a function swap to conveniently swap device array pointers, but it is not working, I assume I am swapping local array pointers in the swap function and not the ones I am passing to it.

__global__ void device_add_one(float *A, float *B)
{
    for (int index = blockIdx.x * blockDim.x + threadIdx.x;
         index < N;
         index += blockDim.x * gridDim.x)
    {
        // just for the example
         B[index] = A[index] + 1;
    {
}

void swap(float *a, float *b)
{
    float *temp = a;
    a = b;
    b = temp;
}

void loop(float *host_array, int size, int loops)
{
    cudaMalloc(&A, (size * sizeof(float));
    cudaMalloc(&B, (size * sizeof(float));

    cudaMemcpy(A, host_array, (size * sizeof(float), cudaMemcpyHostToDevice);

    for (int i = 0; i < loops; i++) {
        device_add_one<<< 1, 254 >>>(A, B);

        // swap pointers like this does not work
        swap(A, B);

        /* This works:
        float *temp = a;
        a = b;
        b = temp;
        */
    }

    cudaMemcpy(host_array, A, (size * sizeof(float), cudaMemcpyDeviceToHost);
}
VSB
  • 232
  • 1
  • 4
  • 13
  • 3
    Your first method will work. [here](https://stackoverflow.com/questions/43482463/cuda-program-not-working-as-fast-as-expected/43485665#43485665) is one example, and there are others. You haven't shown a complete code nor explained why you think it doesn't work, so not sure what to say here. When you are saying something doesn't work, you are [supposed to provide](https://stackoverflow.com/help/on-topic) a [mcve]. – Robert Crovella Aug 11 '17 at 20:15
  • 1
    From a glance, the current solution (swapping the pointers in the loop) *should* work - in which way does it *not* work? – Marco13 Aug 11 '17 at 20:15
  • Your code as posted has a variety of syntax errors. You could not possibly compile that code. If the various syntax errors are fixed, and appropriate `main` function and other definitions supplied as needed, the code you have shown works fine according to my testing. This question, in this state, is pretty much unanswerable, and SO provides a vote-to-close reason specifically for this case. – Robert Crovella Aug 11 '17 at 20:55
  • @RobertCrovella I found the error, I am updating the code to include it, but I need help with solution. Please give me a minute. – VSB Aug 11 '17 at 20:59
  • 2
    http://www.cplusplus.com/reference/algorithm/swap/ – talonmies Aug 12 '17 at 04:30

1 Answers1

3

Your function call method of swapping pointers does not work because you are using pass-by-value. This is an ordinary C/C++ programming concept, not unique to CUDA.

When you pass variables (including pointers, in this case) to a function by value:

void swap(float *a, float *b)

the C pass-by-value mechanism creates a local copy of the function arguments, for use within the function body. Changes to those arguments do not show up in the calling context. To work around this, a simple approach would be pass-by-reference (in C++):

void swap(float* &a, float* &b)

Here is a worked example:

$ cat t393.cu
#include <stdio.h>

const int N = 1000;
float *A, *B;

__global__ void device_add_one(float *A, float *B)
{
    for (int index = blockIdx.x * blockDim.x + threadIdx.x;
         index < N;
         index += blockDim.x * gridDim.x)
    {
        // just for the example
         B[index] = A[index] + 1;
    }
}
void swap(float* &a, float* &b){
  float *temp = a;
  a = b;
  b = temp;
}

void loop(float *host_array, int size, int loops)
{
    cudaMalloc(&A, size * sizeof(float));
    cudaMalloc(&B, size * sizeof(float));

    cudaMemcpy(A, host_array, (size * sizeof(float)), cudaMemcpyHostToDevice);

    for (int i = 0; i < loops; i++) {
        device_add_one<<< 1, 254 >>>(A, B);

        // swap pointers
        swap(A, B);
        //float *temp = A;
        //A = B;
        //B = temp;
    }

    cudaMemcpy(host_array, A, (size * sizeof(float)), cudaMemcpyDeviceToHost);
}

int main(){

  float *data = (float *)malloc(N*sizeof(float));
  for (int i = 0; i<N; i++) data[i] = i & 3;  // fill with 0 1 2 3 0 1 2 3...
  loop(data, N, 100);
  for (int i = 0; i<20; i++) printf("%f ", data[i]);
  printf("\n");
  return 0;
}
$ nvcc -arch=sm_61 -o t393 t393.cu
$ cuda-memcheck ./t393
========= CUDA-MEMCHECK
100.000000 101.000000 102.000000 103.000000 100.000000 101.000000 102.000000 103.000000 100.000000 101.000000 102.000000 103.000000 100.000000 101.000000 102.000000 103.000000 100.000000 101.000000 102.000000 103.000000
========= ERROR SUMMARY: 0 errors
$
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257