0

I am currently porting some applications to use the ARM SVE features with the intrinsic functions as defined in ARM C Language extensions for SVE.

Upon checking the documentation I have come across two functions to sum up elements of the floating point vector using reduction. That is using left-to-right and tree based reduction.

float64_t svadda[_f64](svbool_t pg, float64_t initial, svfloat64_t op);

float64_t svaddv[_f64](svbool_t pg, svfloat64_t op);

Documentation:

These functions (ADDV) sum all active elements of a floating-point vector. They use a tree-based rather than left-to-right reduction, so the result might not be the same as that produced by ADDA."

Why would a tree-based reduction differ from left-to-right reduction? Do they mean this because of the rounding errors or am I missing something?

Bine Brank
  • 33
  • 7

1 Answers1

2

Yes, floating point math is not quite associative because of rounding temporaries, so it matters what order you do the operations.

You might need strictly left-to-right order to exactly implement the right order of operations, otherwise normally you'd hsum by extracting the high half to another vector and then vertically adding to the first vector. Then repeat this narrowing until you're down to a single element.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Just as it's an interesting set of background reading on this one, I wrote a blog on hitting exactly this problem with the ASTC texture compressor: https://solidpixel.github.io/2021/02/25/invariant-tail.html – solidpixel May 13 '21 at 11:30
  • Also note that the left-to-right order will be slower than the halving reduction, as it's a longer serial computation chain. – solidpixel Jun 04 '21 at 11:12
  • @solidpixel: Indeed, strict serial is usually only good for reproducibility, it's not "more accurate"; with uniform number sizes usually *less* accurate. A tree is full pairwise summation for this vector->scalar reduction; SIMD vectorizing a sum / dot-product with multiple accumulators over an array is a step in that direction for the array part ([Simd matmul program gives different numerical results](https://stackoverflow.com/a/55478102)). More smaller accumulators means less magnitude difference in things you're adding, assuming your numbers aren't alternating positive/negative for example – Peter Cordes Jun 04 '21 at 11:17