1

As a part of particle system simulating gravitation and collisions between many objects I came across a strange bug.

This is a school project aimed at optimization through vectorization (OpenMP SIMD) and as a part of the optimizations I wanted to get rid of following expression:

if (r > COLLISION_DISTANCE) {
    resultVelX += gravityVelX;
    // ... same for remaining dimensions
}
if (r > 0.f && r < COLLISION_DISTANCE) {
    resultVelX += collisionVelX;
    // ... same for remaining dimensions
}

My idea was to capture both conditions in variables and then add values to result without ifs, using simple arithmetics:

const int condGrav = (r > COLLISION_DISTANCE);
const int condCol = (r > 0.f && r < COLLISION_DISTANCE);
resultVelX += condGrav * gravityVelX + condCol * collisionVelX;
// ... same for remaining dimensions

We have been given a set of tests for this project and to my surprise, while simple tests passed in both versions of code, in the most complex cases, it failed for the second version, reporting infinite precision error (e+616 as I found out from the most informative logs).

All computations are done on floats. Compilation is done with intel 2016a compiler icpc.

Question: What is wrong with the second piece of code? Is it just wrong or there is something to floats themselves I am missing?

doomista
  • 501
  • 2
  • 10
  • 2
    Can you show an actual falling case? Your solution looks like it should work. You should also be benchmarking this. Micro-opimizations like this may or may not actually improve performance. Branch predictors are good so what you are doing is comparing the performance of a branch to multiplication which may not win. – NathanOliver Nov 16 '18 at 17:31
  • 1
    Yes - https://stackoverflow.com/questions/5369770/bool-to-int-conversion – Adam Kotwasinski Nov 16 '18 at 17:43
  • @NathanOliver It is not quite easy to pinpoint which operation does fail, it is almost 1000 points interacting with each other. – doomista Nov 16 '18 at 18:10
  • multiplication much harder then little `if` statement – kerrytazi Nov 16 '18 at 18:27
  • 1
    @kerrytazi the program is vectorized, ifs are actually a concern – doomista Nov 16 '18 at 18:42
  • @doomista vectorized by what? SIMD? – kerrytazi Nov 16 '18 at 18:49
  • @kerrytazi Yes. I've updated the question to make that clear. – doomista Nov 16 '18 at 19:03
  • 1
    @doomista in the same program write both versions and assert they are equal. This way you can see where the fail is. I suspect it is somewhere else, maybe in the way you vectorize, because the transformation looks ok to me also. – bolov Nov 16 '18 at 19:10
  • 1
    I take that back. They are definitely very different: https://godbolt.org/z/yl-UGY – bolov Nov 16 '18 at 19:16
  • 1
    a starting point: https://stackoverflow.com/questions/30242196/is-floating-point-multiplication-by-zero-guaranteed-to-produce-zero – bolov Nov 16 '18 at 19:21
  • The logic is flawless, at least the part you shared. I couldn't find references to "infinite precision error" and even if I could, you did not tell us the types of your variables. I just see some const int. The code you posted does not allow us to reproduce the error, so it's not easy to help you... – L.C. Nov 16 '18 at 19:28
  • The question says that everything unspecified is float. By "infinite precision error" I've basically wanted to tell that the mentioned change threw calculations off so much that the difference in position of two points in 3D was suddenly non-representable in floats. I agree it is hard to reproduce the error but I opted for not sharing tons of source files and big set of testing data. – doomista Nov 16 '18 at 19:41
  • Did you try using a tool like valgrind or similar? This looks like the kind of problems that valgrind can detect. – Damien Nov 16 '18 at 20:34

2 Answers2

3

Your change looks equivalent to me, assuming that gravityVelX is never NaN. A boolean will convert to 0.0 or 1.0.

If your change enables an optimization that wasn't previously possible or done, then maybe ICC's default of -ffast-math is causing the problem. (The default is actually -fp-model fast=1: https://software.intel.com/en-us/node/522979. That's similar to gcc's -ffast-math that allows optimizations which change the result.)


BTW, you might get better results with this, because SSE2 can do that directly

resultVelX += r > COLLISION_DISTANCE ? gravityVelX : 0.0;

That most directly expresses in C what you want the compiler to emit
(cmpps r,collision_distance / andps gravityVelX, cmp_result / addps resultVelX, and_result). You don't actually want or need to multiply, and creating an actual 1.0 is more cumbersome than just adding 0 or what you want.

x86 SIMD compare instructions produce a vector of all-zero or all-one elements which you can use as an AND mask directly. This works great for conditional addition, because the all-zero bit-pattern represents IEEE 754 0.0, and zero is the additive identity.

(Without -ffast-math, compilers can't always assume that adding 0.0 is a no-op. I think because of signed zero. Usually you need extra options to tell compilers that FP operations can raise exceptions, i.e. that exceptions are unmasked and thus a visible side-effect. Anyway, with ICC's default options, it should be able to convert the if into branchless code on its own. But if it's having problems, hand-holding it with a ternary that always adds something is the way to go.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0

The pseudo code for SIMD (SSE) comparisong looks like:

__m128 _mm_cmpgt_ps (__m128 a, __m128 b)

FOR j := 0 to 3
    i := j*32
    dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xffffffff : 0
ENDFOR

The result of comparacing is not 1.0f it is NaN.
I don't realy know, because I don't see full code, but that may be reason for wrong calculations if you use SSE in your program.

kerrytazi
  • 583
  • 1
  • 4
  • 15
  • I may have been a bit unclear with the way I've written the question. Vectorization was the reason I came up with the expression that caused me trouble, but the results were equal for both vectorized and unvectorized code. – doomista Nov 16 '18 at 19:27