5

In order to compare which flags various -march settings will enable, I am comparing the outputs of the following commands, as detailed in this SO answer:

$ gcc -Q -march=native         --help=target
$ gcc -Q -march=skylake-avx512 --help=target

Please note, for the avoidance of doubt, the detected arch output from using -march=native is skylake-avx512.

$ gcc -Q -march=native --help=target | grep march
  -march=                           skylake-avx512

Most of the flags the two -march variants output match exactly.

However, there are a few differences:

$ diff <(gcc -Q -march=native --help=target) <(gcc -Q -march=skylake-avx512 --help=target)
12c12
<   -mabm                               [enabled]
>   -mabm                               [disabled]
119c119
<   -mpku                               [disabled]
>   -mpku                               [enabled]
136c136
<   -mrtm                               [enabled]
>   -mrtm                               [disabled]
138c138
<   -msgx                               [disabled]
>   -msgx                               [enabled]

It is these differences which have prompted me to ask this question.

How does -march=native choose which instruction sets to enable and which to disable?

I have the following conjecture:

  • -march=native will be using CPUID instructions to calculate supported instruction sets etc in order to detect the processor variant
  • -march=foobar will use a hardcoded list of instruction sets which processor foobar supports.

If that is correct then I can see two possible ways this shakes out:

Option 1:

It is possible that -march=native may not get it 100% correct, whereas when a new processor is released, the table of supported instruction sets is updated, and is more likely to be correct.

Therefore we would expect -march=foobar to be the "more correct" flag.

Option 2:

-march=native will be using CPUID instructions to calculate supported instruction sets - and is therefore guaranteed to be correct, whereas -march=foobar will use a hardcoded list of instruction sets which may not be correct.

Therefore we would expect -march=native to be the "more correct" flag.

If Option 2 is correct, one could surmise that using -march=foobar could end up with an unsupported instruction set enabled - and if the program were to emit these instructions result in a crash.

I have thus far been unsuccessful in finding the answer as to whether either or any of the above is correct.

If I want to target a specific arch, be sure all (and only) supported instruction sets are enabled, and am unable to use -march=native, what is the best way to do this?

Steve Lorimer
  • 27,059
  • 17
  • 118
  • 213
  • As I understand it, `-march=native` will detect the ISA and extensions to use from `cpuid` (which include model, family and stepping information). `-march=xxx` will use a *baseline* set of extensions and a baseline ISA. There are a lot of possible combinations of extensions, so only the most relevant were chosen (e.g. `skylake-avx512` was added to reflect an important extension of some skylakes). `-march=native` is not good for distribution, the only way to use all of a CPU extensions safely is either a runtime dispatcher or having *a lot* of different binaries. – Margaret Bloom Sep 17 '20 at 07:22
  • Consider that software is usually compiled with a very low baseline (e.g. generic x86-64) and that distributions like Gentoo are all about compiling optimally for your CPU. – Margaret Bloom Sep 17 '20 at 07:22
  • @MargaretBloom thanks for your response. The binaries we produce are for our own proprietary use, not for distribution. We wish to move our production binary builds off bare-metal and onto a VM-based solution, hence not being able to use `-march=native`. We do, however, want to compile optimally for the target machine. – Steve Lorimer Sep 17 '20 at 08:10
  • Probably the easiest way to do it is by installing gcc in the VM and use it only to gather its `-march=native` configuration (then uninstall it). As an alternative, you can fine-tune each compilation option manually. – Margaret Bloom Sep 17 '20 at 09:09
  • 1
    @MargaretBloom yes this is the same conclusion I came to. I think I'll just target the baseline ISA for now, measure the performance against `march=native`, and if I see a regression go for a more finegrained approach. Thanks for the input! – Steve Lorimer Sep 17 '20 at 09:16

0 Answers0