I have a 3D-image with dimensions 512*512*512. I have to process all the voxels individually. However, I can't get the right dimensions to get the x, y and z-values to get the pixel.
In my kernel I have:
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
int z = blockIdx.z * blockDim.z + threadIdx.z;
I am running the program by using:
Kernel<<<dim3(8,8), dim3(8,8,16)>>>();
I chose those because having 64 blocks with each 1024 threads should give me every pixel. However, how do I get the coordinate values when I have those dimensions...
When calling the kernel function I have to set some dimensions that the x, y and z-values actually go from 0 to 511. (This gives me the position of every pixel then). But every combination I try, my kernel either does not run or it runs but the values don't get high enough.
The program should make it possible so that every kernel gets a pixel with (x,y,z) that correspond to that pixel in the image. In most simple way I am trying just to print the coordinates to see if it prints all of them.
Any help?
EDIT:
My properties of my GPU:
Compute capability: 2.0
Name: GeForce GTX 480
My program code just to test it out:
#include <stdio.h>
#include <cuda.h>
#include <stdlib.h>
// Device code
__global__ void Kernel()
{
// Here I should somehow get the x, y and z values for every pixel possible in the 512*512*512 image
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
int z = blockIdx.z * blockDim.z + threadIdx.z;
printf("Coords: (%i, %i, %i)\n", x, y, z);
}
// Host code
int main(int argc, char** argv) {
Kernel<<<dim3(8, 8), dim3(8,8,16)>>>(); //This invokes the kernel
cudaDeviceSynchronize();
return 0;
}