Parallelize functions, Kernel inside Kernel is possible? gcc OpenCL

Question

I'm new in the world of OpenCL and I would like to increase my knowledge about it.

I have tried to find information about how build 'complex functions' using OpenCL. For 'complex functions', I mean functions which could be parallelized and have a function inside that can be parallelized too. I have seen links like:

And now, here I go with my question, I'm going to use an example:

// A and B are int vectors
// The value of M and N are different!! M != N
for(int i=0; i<=M-2;i++){
  for(int j=i+1;j<=M-1;j++){
    distance=calculate_distance(A[i],B[j]);
    //more sequential instructions
  }
}

And the calculate_distance concatenate both vectors and has a loop:

for(int i=0; i<=N-1;i++)
  // Some sequential instructions

Could this full fragment of code be parallelized? In that case How (this is the reason of the tittle kernel inside kernel)?

Note: I'm using Intel(R) SDK for OpenCL - Offline Compiler 2012 ( Windows) to check my kernels.

Thanks in advance

What value ranges do you expect for M and N? do you know anything else about the data? can you provide more information about calculate_distance()? — mfa, Jan 03 '13 at 14:50
The value of both are lower than 10, but different. And about calculate_distance return a integer which is equal to the number of 1's inside the concatenation of A with B — Fran, Jan 03 '13 at 16:02
By number of 1's, do you mean the total bits in the two ints you pass to calculate_distance? What operation do you execute N-1 times? Is this an algorithm I can look up for more info somewhere? — mfa, Jan 03 '13 at 16:47

score 2 · Accepted Answer · edited May 23 '17 at 12:04

In order to write parallel code you need pay much more attention to data flow. What does your input data look like? What does your output data look like? How do you transform a piece of input data into output data?

As for your question(s):

It's not possible to decide whether the example you provided is parallelizable because the data flow is not apparent.
You can call functions from your kernel code, they will be inlined into the kernel.

Hint:

Also check Converting C/C++ for loops into CUDA - it's CUDA not OpenCL, but the principles are alike.

If your output data is just a single value (e.g. maximum distance) you might want to look at reduction kernels and understand how they work.

The first of all, excuse me for the delay. Thank you for your answer, you were right. I had a mistake with input/output data. — Fran, Jan 15 '13 at 16:14

manav m-n · Answer 2 · 2013-01-03T14:17:03.173

-1

Make your function re-entrant.

edited Jan 03 '13 at 14:17

answered Jan 03 '13 at 14:08

manav m-n

11,136
23
74
97

Parallelize functions, Kernel inside Kernel is possible? gcc OpenCL

2 Answers2