The Principle
I know, such a simple calculation wouldn't be worth to get parallized elaborately. It's such an example and the mathematical operation is just a placeholder for some more interesting computations.
[Pseudo code]
var id = 0,
do {
id = getGlobalId();
output[id] = input[id] * input[id];
} while (inRange(id) && output[id] !== 25);
The most special expression might be: output[id] !== 25
. That means: If input
has four elements (in this order): [8, 5, 2, 9]
, then output
should be [64, 25]
and the square of 2
or 9
wouldn't be used as item of output
(because output[id] !== 25
is true
for id = 1
and input[id] = 5
).
If you are optimizing this piece of code, you might want to compute the square of every input[id]
ahead of time (without proving the second while
condition), but there's no guarantee that the result is relevant later on (if the result of an previous computation was 25, the result of the current computation is uninteresting).
Generalized, I'm talking about cases where the computation result output[id]
(output[id] = calculateFrom(input[id]);
) is maybe not relevant for every id
- the need of the result (output[id]
) depends one the result of another computation.
My Goal
I want to execute this loop as parallel and high-performance as possible using OpenCL kernels and queues.
My Ideas
I thought: In order to be able to parallelize such
do...while
loops we should do some computations (output[id] = calculateFrom(input[id]);
) simultaneously ahead of time (without knowing if the resultoutput[id]
will be useful). And if the result of a previous was25
, then the resultoutput[id]
simply gets rejected.Maybe we should think about the probability of
output[id] !== 25
. If the probability is very high we won't do many computations ahead of time because their results probably get rejected. If the probability is absolutely low, then I should do more computations ahead of time.We should listen to the current status of the processing unit. If it's already overstrained, we shouldn't do unimportant ahead-of-time computations. But if there are enough resources to process the ahead-of-time computations, why not then. - Because: If the ahead-of-time computations and the previous computations (on which these ahead-of-time computations rely on) are processed at the same, then the ahead-of-time additional might also slow the previous computations down - (See my second question)
My Questions
- Is it wise or high-performance to parallelize such programs?
- Based on which criteria should I decide if the processing unit has enough resources to do my ahead-of-time computing things? Or: How can I know if my processing unit is too overstrained?
- Do you know about any other plan for parallelizing such
do...while
s? Do you have any idea concerning that?
I hope it's always clear what I want to tell you. But if it isn't, please comment my question. - Thanks for your answers and your help.