0

I recently tried to add the -ftree-vectorize compiler option to the build step of my project. I ran tests after the change, and for some reason when compiling with Clang, one of my end-to-end tests fails because of significant floating point differences. This is quite surprising to me, especially since GCC still works fine.

Digging into the issue a bit, I find that from GCC documentation -ftree-vectorize turns on 2 separate flags, -ftree-loop-vectorize and -ftree-slp-vectorize. Clang in its effort to match GCC of course has the -ftree-vectorize flag, and also has the -ftree-slp-vectorize flag. I tested Clang with just the ftree-slp-vectorize option, and the test passes.

However, it doesn't have the -ftree-loop-vectorize option so I can't try it with just that, and judging from Clang documentation I don't even know if -ftree-vectorize turns on 2 flags under the hood like GCC.

I am rather stumped on how vectorization can affect floating-point results. I know that floating point operations aren't associative, so I don't believe Clang would break the as-if rule by vectorizing floating point operations. I'm pretty sure that I didn't switch on unsafe math optimizations either, since my compiler options are just -g -O2 -ftree-vectorize.

Worst case here I've struck some implementation behavior/UB, but I wanted to ask here first to people more experienced about if there could be something else I'm missing.

k huang
  • 409
  • 3
  • 10
  • How are you checking the results? Given that there is no guarantee that `a + b == a + b` https://stackoverflow.com/a/24446382/3370124 you might be running into simple reordering issues. – Richard Critten Feb 28 '23 at 23:38
  • 1
    What ISA are you compiling for? x86-64? If so, then FLT_EVAL_METHOD==0 and things should be fairly sane, with FP results not depending on optimization level or optimization choices. One thing that compilers can still do (without `-ffast-math`) is contract `a*b+c` into an FMA, at least within one expression. (`-ffp-contract=on`, vs. across statements with GCC's default of `-ffp-contract=fast`. But clang doesn't do that unless you tell it, it's default is `off` or `on` IIRC, but not `fast`.) – Peter Cordes Feb 28 '23 at 23:40
  • 1
    Anyway, your expectation is correct, GCC/clang can't vectorize in ways that would change FP rounding except for FMA, since you didn't use `-ffast-math` or `-fopenmp` with `#pragma omp simd reduction(+:foo)` or something in the source. – Peter Cordes Feb 28 '23 at 23:42
  • @RichardCritten we have tolerances for floating num, comparing absolute and relative differences – k huang Feb 28 '23 at 23:42
  • Perhaps you have a bug somewhere? Different results with different optimization levels can be due to undefined behaviour. Compile with `-Wall` and check / fix all the warnings. If your code is multi-threaded, check with thread sanitizer. Maybe use `clang -O2 -Wall -fsanitize=undefined` – Peter Cordes Feb 28 '23 at 23:44
  • @PeterCordes that's what I'm fearing sadly, I was just asking this question in hopes it was a bug of lower hanging fruit – k huang Feb 28 '23 at 23:47
  • 1
    @RichardCritten: The reason that `a+b == a+b` can be false is that either could be NaN, and `NaN==NaN` is false. For finite inputs, FP addition is commutative, IIRC even for the sign of `+-0.0`. Also for infinities I think. It's also commutative for NaN unless you care about the NaN "payload", but the semantics of `==` mean that's not a valid way to express commutativity. The question you linked has answers that are technically correct because of the behaviour of `==`, but not very helpful in understanding `+`. – Peter Cordes Feb 28 '23 at 23:56
  • 1
    Is there a minimal example that shows your test and compilation commands? – Fantastic Mr Fox Mar 01 '23 at 00:03
  • @FantasticMrFox the test is really big unfortunately, and no unit tests fail. my compilation optimization options are just `-g -O2 -ftree-vectorize` as noted above, the rest are just warning flags – k huang Mar 01 '23 at 00:05
  • `-g -O2 -ftree-vectorize` I can't reproduce on a 68000. A repro case would be most helpful. – Eljay Mar 01 '23 at 00:12

0 Answers0