According to most benchmarks, Intel's Clear Linux is way faster than other distributions, mostly thanks to a GCC feature called Function Multi-Versioning. Right now the method they use is to compile the code, analyze which function contains vectorized loops, then patch the code with FMV attributes and compile it again.
How feasible will it be for GCC to do it automatically? For example, by passing -mmultiarch=sandybridge,skylake
(or a similar -m option listing CPU extensions like AVX and AVX2).
Right now I'm interested in two usage scenarios:
- Use this option for our large math-heavy program for delivering releases to our customers. I don't want to pollute the code with non-standard attributes and I don't want to modify the third-party libraries we use.
- The other Linux distributions will be able to do this easily, without patching the code as Intel does. This should give all Linux users massive performance gains.