My question is related to the tutorial which explains how to implement boost::odeint with VexCL in order to achieve concurrency (the complete code can be found here).
The following figure shows how I think of the iterations of ODEINT:
Now I ask myself, what exactly / or which part of it is parallelised in VexCL?
My impression is, the ODE part is one single task, as all equations of ODE are within one block in the given example. Maybe the integration part runs in three parallel tasks. This results in four tasks, where (I think) the ODE task is a bottle neck (because the equations can become very large).
If this is right I would like to know, how to improve this concurrency. I think it make sense to combine ODE and INT horizontally. This results in 3 tasks, each of which cannot be further reduced at this level.