I have been working on a game of life implementation with cuda. I want to find the array index of each element so i can calculate the neighbours for that element and so that i can write the new value to that array. All the things i found about this are working with pointers to rows and i just cant figure it out how this exactly translates to indexes. To give a better idea of what i mean i have to following code (some snippets):
#define THREADSPERBLOCK 1024
lifeState *d_gameStateInitial;
size_t d_gameStateInitial_pitch;
int sizeX = 100;
int sizeY = 100;
int numBlocks = ((sizeX * sizeY) % THREADSPERBLOCK) + 1;
int numThreadsPerBlock;
if(numBlocks == 1)
{
numThreadsPerBlock = sizeX * sizeY;
}
else
{
numThreadsPerBlock = THREADSPERBLOCK;
}
cudaMallocPitch((void **)&d_gameStateInitial, &d_gameStateInitial_pitch, sizeX * sizeof(lifeState), sizeY);
doTheGame<<<numBlocks, numThreadsPerBlock>>>(d_gameStateInitial, d_gameStateInitial_pitch, d_gameStateNew, d_gameStateNew_pitch, sizeX, sizeY);
The "lifestate *" is simply a struct containing an dead/alive enum. Both arrays, the initial and new ones are malloc'd exactly the same way. In the doTheGame kernel i now want to know how to calculate the index, i was thinking about something like this but i think it is wrong:
__global__ void doTheGame(lifeState *initialArray, size_t initialArrayPitch,
lifeState *newArray, size_t newArrayPitch,
int sizeX, int sizeY)
{
int initialArrayThreadIndex = (blockIdx.x * initialArrayPitch) + threadIdx.x;
int newArrayThreadIndex = (blockIdx.x * initialArrayPitch) + threadIdx.x;
}
Everything i found thus far are basically all the same as the cudaMallocPitch example:
T* pElement = (T*)((char*)BaseAddress + Row * pitch) + Column;
But i just cant see how that translates to blocks, threads and x and y exactly.
Thanks in advance.