0

In the constructor I fill the array on the device side.

but now I want to execute reverse function on the array.

 using namespace std;
#include <stdio.h>
#include <stdlib.h>
#include <iostream>


__global__ void generateVector(int *data,int count){
    int tid = blockIdx.x;
    data[tid] = -tid;
}

__global__ void reverseArray(int *data,int count){
    int tid = blockIdx.x;
    data[tid] = tid;
}

class FData{
private:
    int *data;
    int size;
public:
    FData(int sizeP){
        size = sizeP;
        data = new int[size];
        int *devA;

        cudaMalloc((void**) &devA, size * sizeof(int));
        generateVector<<<size,1>>>(devA,size);
        cudaMemcpy(data,devA, size * sizeof(int),cudaMemcpyDeviceToHost);

        cudaFree(devA);
    }

    ~FData(){
        delete [] data;
    }

    int getSize(){
        return size;
    }



    int elementAt(int i){
        return data[i];
    }

    void reverse(){
        int *devA;
        cudaMalloc((void**) &devA, sizeof(int));
        reverseArray<<<size,1>>>(devA,size);
        cudaMemcpy(data,devA,size * sizeof(int),cudaMemcpyDeviceToHost);
        cudaFree(devA);

    }


};


int main(void) {

    FData arr(30);

    cout << arr.elementAt(1);


    arr.reverse();
    cout << arr.elementAt(1);


    return 0;

}

It still prints the values which I filled in the constructor. What is the problem here? How can i solve it? What is going wrong?

asdasd
  • 55
  • 1
  • 1
  • 7

1 Answers1

1

Your kernels aren't reversing anything. They're just negating the values, so if anything I would be quite surprised if you saw anything get reversed. With that said, if you add error checking to your code (see this other SO post on how best to do the error checking) then you'll see that your code will fail on the call to cudaMalloc in your reverse function. You can fix this by changing devA to be a plain pointer (it doesn't really make sense for you to be allocating it as a host-array anyways, as you're not using it on the host to begin with).

void reverse(){
    int *devA;
    cudaMalloc((void**) &devA, size * sizeof(int));       
    reverseArray<<<size,1>>>(devA,size);
    cudaMemcpy(data,devA,size * sizeof(int), cudaMemcpyDeviceToHost);
    cudaFree(devA);
}

Also, you should free your memory too, you have both host-side and device-side memory leaks. Whenever you have a cudaMalloc call, you should havea corresponding cudaFree. Also, consider adding a destructor to free your host-side data member, as you have a memory leak there too.

~FData()
{
    delete [] data;
}
Community
  • 1
  • 1
alrikai
  • 4,123
  • 3
  • 24
  • 23
  • It works for me, maybe try posting your updated code and I can take a look – alrikai May 16 '13 at 19:14
  • @asdasd look at your call to cudaMalloc in your reverse function. You're allocating 1 int on the device. You need to allocate `size` number of ints. – alrikai May 16 '13 at 19:24
  • Also I should point out that if you had error checking in your code then you would have likely caught it earlier, as you'll be doing 29 out-of-bounds memory accesses in your kernel – alrikai May 16 '13 at 19:26