This code as written is fairly unsuitable for SPEs since it's so branch-heavy. Also, information on the types of the variables involved would help; the test as written seems fairly obscure (even with the >0
fix), but the code looks like it might be C++ using some sort of vector class that overloads operator -
to mean vector subtraction and operator *
of two vectors to compute a dot product.
The first thing to do with such simple loops on SPEs is to get them branch-free (at least the inner loop; i.e. unroll a couple of times and only check for early exit every N iterations) and use SIMD instructions: SPEs only have SIMD instructions, so not using SIMD processing in your loops instantly wastes 75% of your available register space and computational power. Similarly, SPEs can only load aligned qwords (16 bytes) at a time, using smaller data types requires extra work to shuffle the contents of registers around so that the value you're trying to load ends up in the "preferred slot".
You get rid of the if (k == i || k == j)
by rewriting the first part of the loop using the following branch-free form (this is pseudocode. It's immediately applicable for ints, but you'll need to use intrinsics to get bitwise ops on floats):
dd = q2_vect[k] - q1_vect;
d2 = dd * dd;
d2 &= ~(cmp_equal(k, i) | cmp_equal(k, j));
Here, cmp_equal
corresponds to the respective SPE intrinsics (semantics: cmp_equal(a,b) == (a == b) ? ~0u : 0
). This forces d2
to zero when k == i
or k == j
.
To avoid the if (d2 > 0)
branch in the inner loop, do the following:
a |= cmp_greater(d2, 0);
and only check if a
is nonzero (to early-out) every few loop iterations. If all values computed for d2
are nonnegative (will be the case if your type is ints, floats or a real-valued vector class), you can simplify this further. Just do:
a |= d2;
In the end, a
will only be nonzero if all of the individual terms were nonzero. But be careful with integer overflows (if you're using ints) and NaNs (if you're using floats). If you have to handle these cases, the above simplification will break the code.