-1

I have made a serial version for a code to calculate a histogram and I know the algorithm works. The problem is that when I do it in CUDA, the only thing I get back as a results are all 0. I can copy the input array dev_x into the output variable h, and I am able to see the input values of x.

The input data is a list of x and y positions with a corresponding color (int from 1 to 5)

The arguments are the input file name, output file name, cellWidth and cellHeight, where cellWidth and cellHeight is the number of regions the input is divided in. A 1000000 X 1000000 array is divided into 1000 X 1000 regions. I need to calculate the number of occurrences of each color in each region.

dead_jake
  • 523
  • 2
  • 12
  • 30
  • 2
    add [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) to your code, and you'll likely get an idea of where things are going wrong. – Robert Crovella Apr 21 '14 at 00:21
  • 1
    In addition to the error checking point, it would be good if your code was self contained. Right now it requires a file and an understanding of the required command line arguments and values that nobody but you has access to. I can't/don't know how to run your program, so I can't see what it does or use any of the standard diagnostic tools to find what might be going wrong. Without that, I fail to see how anyone can actually answer this question. Vote to close. – talonmies Apr 21 '14 at 09:33
  • @RobertCrovella - I will add error checking as soon as I get home tonight. Thanks for the suggestion. – dead_jake Apr 21 '14 at 14:50
  • @talonmies - I added a description of the input arguments and input file. – dead_jake Apr 21 '14 at 14:51
  • @dead_jake: Are you telling me that to run your program I should just make my own input text file with a few billion lines in it and run your code? Does that seem reasonable to you? – talonmies Apr 21 '14 at 15:00
  • @talonmies - The size of the array does not really matter. I can provide a test input file later tonight. – dead_jake Apr 21 '14 at 16:24
  • Are you seriously telling me you want me to download a **150Mb** text file to run your code? – talonmies Apr 22 '14 at 04:59

1 Answers1

2

There are at least two gigantic, basic problems in this code, neither of which has anything to do with CUDA:

histSize = sizeof(unsigned int) * xMax/cellWidth * yMax/cellHeight * numColors;

//....

 h = (unsigned int*) malloc(histSize);

//.....

for(i=0; i<histSize; i++)
    h[i]=0; // <-- buffer oveflow

which is probably killing the program before it ever even gets to launch the kernel, and:

cudaMalloc( (void**) &dev_h, histSize );

// .......

cudaMemcpy(dev_h, h, size, cudaMemcpyHostToDevice); // buffer overflow

which would kill the CUDA context if the program ever got that far.

These are elementary mistakes and you haven't detected them because your only usage case is apparently a program which attempts to process a 150Mb input file and emit a large histogram from it, and your only method of detecting errors is looking at a file containing that histogram. That is a completely insane way to develop and debug code. If you had done any of the following:

  1. Hardcoded a trivially small test case you already knew the answers for
  2. Added CUDA API error checking
  3. Run valgrind
  4. Used cuda-memcheck
  5. Used a host debugger
  6. ran nvprof

you probably would have instantly detected the problems (there might well be more but I don't care enough to look for them, that is your job), and this Stack Overflow question wouldn't exist.

talonmies
  • 70,661
  • 34
  • 192
  • 269