I would like to speed up the following calculation using SIMD:
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static double Dot(double x1, double x2, double y1, double y2)
{
return x1 * y1 + x2 * y2;
}
I saw that there is Vector2.Dot but this is only for floats and not for doubles.
I cannot switch to .NET Core and therefor I cannot use Vector128 Create (double e0, double e1).
The above Dot
method is used in the following code which computes a union of two sorted list (id arrays):
static void ChainRule(int x_n, int[] x_id, double[] x_jacobi, double x_diff,
int y_n, int[] y_id, double[] y_jacobi, double y_diff,
out int z_n, out int[] z_id, out double[] z_jacobi)
{
int n = x_n + y_n;
z_id = new int[n];
z_jacobi = new double[n];
int i = 0, ix = 0, iy = 0;
while (ix < x_n && iy < y_n)
{
if (x_id[ix] < y_id[iy])
{
z_id[i] = x_id[ix];
z_jacobi[i++] = x_diff * x_jacobi[ix++];
}
else if (y_id[iy] < x_id[ix])
{
z_id[i] = y_id[iy];
z_jacobi[i++] = y_diff * y_jacobi[iy++];
}
else
{
z_id[i] = x_id[ix];
z_jacobi[i++] = Dot(x_diff, y_diff, x_jacobi[ix++], y_jacobi[iy++]);
}
}
while (ix < x_n)
{
z_id[i] = x_id[ix];
z_jacobi[i++] = x_diff * x_jacobi[ix++];
}
while (iy < y_n)
{
z_id[i] = y_id[iy];
z_jacobi[i++] = y_diff * y_jacobi[iy++];
}
z_n = i;
}
I tried as well to precompute the product of x_diff
and x_jacobi
and the product of y_diff
and y_jacobi
with the following code:
double[] x_diff_jacobi = new double[x_n];
for (int i0 = 0; i0 < x_n; i0++)
x_diff_jacobi[i0] = x_diff * x_jacobi[i0];
double[] y_diff_jacobi = new double[y_n];
for (int i0 = 0; i0 < y_n; i0++)
y_diff_jacobi[i0] = y_diff * y_jacobi[i0];
This will simplify the calculation of z_jacobi
, e.g.: z_jacobi[i++] = x_diff_jacobi[ix++] + y_diff_jacobi[iy++]
. But this code is running slower than the one above. I think the problem is the initialization of the additional arrays x_diff_jacobi
and y_diff_jacobi
.
Any other ideas to speed up this code?