In the application which I'm currently developing, I must sum pretty big arrays of vectors efficiently. Here's my code:
public List<double[, ,]> normalMaps;
public double[, ,] Mix(double[] weights, double gain)
{
int w, h;
w = normalMaps[0].GetLength(0);
h = normalMaps[0].GetLength(1);
double[, ,] ret = new double[w, h, 3];
int normcount = normalMaps.Count;
//for (int y = 0; y < h; y++)
Parallel.For(0, h, y =>
{
for (int x = 0; x < w; x++)
{
for (int z = 0; z < normcount; z++)
{
ret[x, y, 0] += normalMaps[z][x, y, 0] * weights[z];
ret[x, y, 1] += normalMaps[z][x, y, 1] * weights[z];
ret[x, y, 2] += normalMaps[z][x, y, 2] * weights[z];
}
ret[x, y, 0] *= gain;
ret[x, y, 1] *= gain;
ret[x, y, 2] *= gain;
ret[x, y, 0] = Math.Max(-1, Math.Min(1, ret[x, y, 0]));
ret[x, y, 1] = Math.Max(-1, Math.Min(1, ret[x, y, 1]));
ret[x, y, 2] = Math.Max(-1, Math.Min(1, ret[x, y, 2]));
double retnorm = Math.Sqrt(ret[x, y, 0] * ret[x, y, 0] + ret[x, y, 1] * ret[x, y, 1] + ret[x, y, 2] * ret[x, y, 2]);
ret[x, y, 0] /= retnorm;
ret[x, y, 1] /= retnorm;
ret[x, y, 2] /= retnorm;
}
});
return ret;
}
Now, when I try to sum 7 1024*1024 arrays of 3-component vectors, the operation takes 320 ms on my laptop. Making the code multithreaded gave me already a huge performance boost. But I need to make it even faster. How can I optimize it even more? I can already see I could use a simple array instead of a List<>, that would make the code faster, but not much. Is there really nothing left to optimize? I was thinking about moving this thing to GPU, but it's just an idea. Can somebody help me out? Thanks in advance.