I am quite new in OpenCL and it makes me problem to think about all GPU execution consequences. I am trying to write sumuation, so I have 2D points and need to calculate "gravity" forces acting in between them all. My best idea of OpenCL kernel looks like this:
kernel void ker_fun(global const double* pts, uint pts_size, global double* fxs, global double* fys, double vertexRepulsion)
{
double x=pts[2*get_global_id(0)];
double y=pts[2*get_global_id(0)+1];
double fx=0;
double fy=0;
for (size_t i=get_global_id(0)+1; i<pts_size; ++i) {
double dx=x-pts[2*i]; // point[i] -> points[THIS]
double dy=y-pts[2*i+1];
double r2=pow(dx, 2)+pow(dy, 2);
r2=max(r2, 0.0001); // to prevent (r2==0) issue
double f=gravityConstant/r2;
double ratio=f/sqrt(r2);
dx*=ratio;
dy*=ratio;
fx+=dx;
fy+=dy;
atomic_add_double(&fxs[i], -dx);
atomic_add_double(&fys[i], -dy);
}
atomic_add_double(&fxs[get_global_id(0)], fx);
atomic_add_double(&fys[get_global_id(0)], fy);
where fxs
and fys
are force values in X and X direction (i.e. my result) and atomic_add_double
function is copied from this site (OpenCL - using atomic reduction for double).
This function works and calculates desired result. But it is slow. Could you please advise me, how to do this different and better way.
Thank you, for your time and help