I have openCL kernel,
__kernel
void add(__global float* A, const int inputSize)
{
int threadId = get_local_id(0);
int blockSize = get_local_size(0);
int groupId = get_group_id(0);
int i = 2 * groupId * blockSize + threadId;
if( i < inputSize && i + blockSize < inputSize)
printf("%d %d\n", A[i], A[i + blockSize]);
.....Doing some more things.....
}
Host Side Code:
int main()
{
........
//Main kernel call
int global_item_size = 4
int local_item_size = 2;
clEnqueueNDRangeKernel(command_queue,kernel, 1, NULL, &global_item_size, &local_item_size, 0, NULL, NULL);
.......
}
So number of work groups launched is 2.
Each work group has 2 threads.
Each thread processes two elements in array A.
So kernel has i and i + blockSize as index of elements processed.
inputSize is 8.
Now the issue I am facing is my this kernel works well without any error and I get proper results when I run this kernel in debug mode.The printf statement in kernel also prints proper values also if I take the values on CPU I can print them properly.
As soon as I switch to release mode. All I get is 0's in my arrays. If I print A in kernel it prints all 0's.
I am not sure what is wrong in release mode? There is definitely not sync or index issue as I am just printing the input array as soon as I come in kernel. Has some one faced similar issue?
Thanks in advance.