2

in this code im generating 1D array of floats on a gpu using CUDA. The numbers are between 0 and 1. For my purpose i need them to be between -1 and 1 so i have made simple kernel to multiply each element by 2 and then substract 1 from it. However something is going wrong here. When i print my original array into .bmp i get this https://i.stack.imgur.com/tpdhn.png (typical noise pattern). But when i try to modify that array with my kernel i get blank black picture https://i.stack.imgur.com/tisIp.jpg . The program is executable but in the debug i get this:

First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe: Microsoft C++ exception: cudaError_enum at memory location 0x003cfacc..

First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe: Microsoft C++ exception: cudaError_enum at memory location 0x003cfb08..

First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe: Microsoft C++ exception: [rethrow] at memory location 0x00000000..

i would be thankfull for any help or even little hint in this matter. Thanks ! (edited)

#include <device_functions.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include "stdafx.h"
#include "EasyBMP.h"
#include <curand.h> //curand.lib must be added in project propetties > linker > input
#include "device_launch_parameters.h"

float *heightMap_cpu;
float *randomArray_gpu;
int randCount = 0;
int rozmer = 513;

void createRandoms(int size){
    curandGenerator_t generator;
    cudaMalloc((void**)&randomArray_gpu, size*size*sizeof(float));
    curandCreateGenerator(&generator,CURAND_RNG_PSEUDO_XORWOW);
    curandSetPseudoRandomGeneratorSeed(generator,(int)time(NULL));
    curandGenerateUniform(generator,randomArray_gpu,size*size);
}

__global__ void polarizeRandoms(int size, float *randomArray_gpu){
    int index = threadIdx.x + blockDim.x * blockIdx.x;
    if(index<size*size){
        randomArray_gpu[index] = randomArray_gpu[index]*2.0f - 1.0f;
    }
}

//helper fucnction for getting address in 1D using 2D coords
int ad(int x,int y){
    return x*rozmer+y;
}

void printBmp(){
    BMP AnImage;
    AnImage.SetSize(rozmer,rozmer);
    AnImage.SetBitDepth(24);
    int i,j;
    for(i=0;i<=rozmer-1;i++){
        for(j=0;j<=rozmer-1;j++){
            AnImage(i,j)->Red = (int)((heightMap_cpu[ad(i,j)]*127)+128);
            AnImage(i,j)->Green = (int)((heightMap_cpu[ad(i,j)]*127)+128);
            AnImage(i,j)->Blue = (int)((heightMap_cpu[ad(i,j)]*127)+128);
            AnImage(i,j)->Alpha = 0;
        }
    }
    AnImage.WriteToFile("HeightMap.bmp");
}

int main(){
    createRandoms(rozmer);
    polarizeRandoms<<<((rozmer*rozmer)/1024)+1,1024>>>(rozmer,randomArray_gpu);
    heightMap_cpu = (float*)malloc((rozmer*rozmer)*sizeof(float));
    cudaMemcpy(heightMap_cpu,randomArray_gpu,rozmer*rozmer*sizeof(float),cudaMemcpyDeviceToHost);
    printBmp();

    //cleanup
    cudaFree(randomArray_gpu);
    free(heightMap_cpu);
    return 0;
}
talonmies
  • 70,661
  • 34
  • 192
  • 269
Tomus
  • 111
  • 2
  • 11
  • You might have some issues with float/integer conversion, try replacing the computation in your kernel with: randomArray_gpu[index] = randomArray_gpu[index]*2.0f - 1.0f; – Reguj Sep 10 '13 at 20:28
  • thx, but that didnt solve it :/ – Tomus Sep 10 '13 at 20:38
  • Because `int` is promoted to `float` before `*` and `+`, see [Implicit type conversion rules in C++ operators](http://stackoverflow.com/questions/5563000/implicit-type-conversion-rules-in-c-operators). – Vitality Sep 10 '13 at 20:38
  • thx Jack, but it is not it. Even when I leave the line written like this: randomArray_gpu[index] = randomArray_gpu[index]; which should just leave the array as it was, the result is the same blank black (all zeroes). – Tomus Sep 10 '13 at 21:26
  • problem was with my understanding of cuda... kernels aparently cannot access __device__ float *randomArray_gpu without the array being in the argument of the kernel function. Now that i've changed the kernel to this: __global__ void polarizeRandoms(int size, float *randomArray_gpu) it works. Thanx again for help, i have modded the int numbers in the equation by your suggestions so you have probably saved me another headache. – Tomus Sep 10 '13 at 21:33
  • @TomášTomusJavorský I was not suggesting Reguj's solution, but explaining why his solution was not working. You do not need to cast the numbers to `float`. Anyway, I come to the same conclusion. For some reasons, `polarizeRandoms` has no visibility of `randomArray_gpu` and one "workaround" is to pass the pointer to the `__global__` function. There is something I'm perhaps overlooking, since this is the same problem as in [CUDA global (as in C) dynamic arrays allocated to device memory](http://stackoverflow.com/questions/79121/cuda-global-as-in-c-dynamic-arrays-allocated-to-device-memory). – Vitality Sep 10 '13 at 21:43
  • Ah thanx again :) Anyway i'll try to keep that rules in mind for the future. – Tomus Sep 10 '13 at 22:49
  • 1
    @TomášTomusJavorský: Please **do not** "fix" code in your questions. The whole point of [SO] is to leave a question and its answer for the next person who comes along. By "fixing" code, you are effectively destroying the question and making the answers you have received invalid. I have rolled back your edits and corrected the title as you wanted. Please don't edit the code again. – talonmies Sep 11 '13 at 11:43

1 Answers1

3

This is wrong:

cudaMalloc((void**)&randomArray_gpu, size*size*sizeof(float));

We don't use cudaMalloc with __device__ variables. If you do proper cuda error checking I'm pretty sure that line will throw an error.

If you really want to use a __device__ pointer this way, you need to create a separate normal pointer, cudaMalloc that, then copy the pointer value to the device pointer using cudaMemcpyToSymbol:

float *my_dev_pointer;
cudaMalloc((void**)&my_dev_pointer, size*size*sizeof(float));
cudaMemcpyToSymbol(randomArray_gpu, &my_dev_pointer, sizeof(float *));

Whenever you are having trouble with your CUDA programs, you should do proper cuda error checking. It will likely focus your attention on what is wrong.

And, yes, kernels can access __device__ variables without the variable being passed explicitly as a parameter to the kernel.

The programming guide covers the proper usage of __device__ variables and the api functions that should be used to access them from the host.

Community
  • 1
  • 1
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • thx, again, fixed now. And yes this was allso in the debug output. – Tomus Sep 11 '13 at 09:04
  • Definitely right. It was too late yesterday to have enough neurons working to realize that there was a `__device__` keywork in front of the pointer declaration :-) – Vitality Sep 11 '13 at 10:18