`-ftree-vectorize` on Clang fails floating-point comparison tests

Question

I recently tried to add the -ftree-vectorize compiler option to the build step of my project. I ran tests after the change, and for some reason when compiling with Clang, one of my end-to-end tests fails because of significant floating point differences. This is quite surprising to me, especially since GCC still works fine.

Digging into the issue a bit, I find that from GCC documentation -ftree-vectorize turns on 2 separate flags, -ftree-loop-vectorize and -ftree-slp-vectorize. Clang in its effort to match GCC of course has the -ftree-vectorize flag, and also has the -ftree-slp-vectorize flag. I tested Clang with just the ftree-slp-vectorize option, and the test passes.

However, it doesn't have the -ftree-loop-vectorize option so I can't try it with just that, and judging from Clang documentation I don't even know if -ftree-vectorize turns on 2 flags under the hood like GCC.

I am rather stumped on how vectorization can affect floating-point results. I know that floating point operations aren't associative, so I don't believe Clang would break the as-if rule by vectorizing floating point operations. I'm pretty sure that I didn't switch on unsafe math optimizations either, since my compiler options are just -g -O2 -ftree-vectorize.

Worst case here I've struck some implementation behavior/UB, but I wanted to ask here first to people more experienced about if there could be something else I'm missing.

How are you checking the results? Given that there is no guarantee that `a + b == a + b` https://stackoverflow.com/a/24446382/3370124 you might be running into simple reordering issues. — Richard Critten, Feb 28 '23 at 23:38
What ISA are you compiling for? x86-64? If so, then FLT_EVAL_METHOD==0 and things should be fairly sane, with FP results not depending on optimization level or optimization choices. One thing that compilers can still do (without `-ffast-math`) is contract `a*b+c` into an FMA, at least within one expression. (`-ffp-contract=on`, vs. across statements with GCC's default of `-ffp-contract=fast`. But clang doesn't do that unless you tell it, it's default is `off` or `on` IIRC, but not `fast`.) — Peter Cordes, Feb 28 '23 at 23:40
Anyway, your expectation is correct, GCC/clang can't vectorize in ways that would change FP rounding except for FMA, since you didn't use `-ffast-math` or `-fopenmp` with `#pragma omp simd reduction(+:foo)` or something in the source. — Peter Cordes, Feb 28 '23 at 23:42
@RichardCritten we have tolerances for floating num, comparing absolute and relative differences — k huang, Feb 28 '23 at 23:42
Perhaps you have a bug somewhere? Different results with different optimization levels can be due to undefined behaviour. Compile with `-Wall` and check / fix all the warnings. If your code is multi-threaded, check with thread sanitizer. Maybe use `clang -O2 -Wall -fsanitize=undefined` — Peter Cordes, Feb 28 '23 at 23:44
@PeterCordes that's what I'm fearing sadly, I was just asking this question in hopes it was a bug of lower hanging fruit — k huang, Feb 28 '23 at 23:47
@RichardCritten: The reason that `a+b == a+b` can be false is that either could be NaN, and `NaN==NaN` is false. For finite inputs, FP addition is commutative, IIRC even for the sign of `+-0.0`. Also for infinities I think. It's also commutative for NaN unless you care about the NaN "payload", but the semantics of `==` mean that's not a valid way to express commutativity. The question you linked has answers that are technically correct because of the behaviour of `==`, but not very helpful in understanding `+`. — Peter Cordes, Feb 28 '23 at 23:56
Is there a minimal example that shows your test and compilation commands? — Fantastic Mr Fox, Mar 01 '23 at 00:03
@FantasticMrFox the test is really big unfortunately, and no unit tests fail. my compilation optimization options are just `-g -O2 -ftree-vectorize` as noted above, the rest are just warning flags — k huang, Mar 01 '23 at 00:05
`-g -O2 -ftree-vectorize` I can't reproduce on a 68000. A repro case would be most helpful. — Eljay, Mar 01 '23 at 00:12

`-ftree-vectorize` on Clang fails floating-point comparison tests

0 Answers0