I am in a startup of OpenCl and still learning.
Kernel Code:
__kernel void gpu_kernel(__global float* data)
{
printf("workitem %d invoked\n", get_global_id(0));
int x = 0;
if (get_global_id(0) == 1) {
while (x < 1) {
x = 0;
}
}
printf("workitem %d completed\n", get_global_id(0));
}
C code for invoking kernel
size_t global_item_size = 4; // number of workitems total
size_t local_item_size = 1; // number of workitems per group
ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &global_item_size, &local_item_size, 0, NULL, NULL);
Ouput:
workitem 3 invoked
workitem 3 completed
workitem 0 invoked
workitem 0 completed
workitem 1 invoked
workitem 2 invoked
workitem 2 completed
## Here code is waiting on terminal for Workitem #1 to finish, which will never end
this clearly states, all workitems are parallel (but in different workgroup).
Another C code for invoking kernel (for 1 workgroup with 4 workitems)
size_t global_item_size = 4; // number of workitems total
size_t local_item_size = 4; // number of workitems per group
ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &global_item_size, &local_item_size, 0, NULL, NULL);
Ouput:
workitem 0 invoked
workitem 0 completed
workitem 1 invoked
## Here code is waiting on terminal for Workitem #1 to finish, which will never end
This clearly states that, this running in sequence (that's why it completed 1st Workitem and then got stuck on second and rest are never executed)
My Question:
I need to invoke 1 workgroup with 4 workitems which run parallel. So that i can use barrier in my code (which i guess is only possible within single workgroup)?
any help/suggestion/pointer will be appreciated.