104

I tried to scrub the GCC man page for this, but still don't get it, really.

What's the difference between -march and -mtune?

When does one use just -march, vs. both? Is it ever possible to just -mtune?

Daniel Widdis
  • 8,424
  • 13
  • 41
  • 63
Jameson
  • 6,400
  • 6
  • 32
  • 53

2 Answers2

120

If you use -march then GCC will be free to generate instructions that work on the specified CPU, but (typically) not on earlier CPUs in the architecture family.

If you just use -mtune, then the compiler will generate code that works on any of them, but will favour instruction sequences that run fastest on the specific CPU you indicated. e.g. setting loop-unrolling heuristics appropriately for that CPU.


-march=foo implies -mtune=foo unless you also specify a different -mtune. This is one reason why using -march is better than just enabling options like -mavx without doing anything about tuning.

Caveat: -march=native on a CPU that GCC doesn't specifically recognize will still enable new instruction sets that GCC can detect, but will leave -mtune=generic. Use a new enough GCC that knows about your CPU if you want it to make good code.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
James Youngman
  • 3,623
  • 2
  • 19
  • 21
  • 11
    Doesn't answer whether it makes sense to use both or whether mtune is redundant when set to the same value. – Pavel Šimerda Feb 10 '15 at 12:35
  • 18
    @PavelŠimerda Intuitively the answer is implicit in the definition of the 2 features. Besides, the documentation explicitly states that `march` implies `mtune`. So, the answers to your objections are no and yes respectively. – underscore_d Feb 26 '16 at 00:46
  • Thank you for explaining this so elegantly! You make it easy to understand. – Rahim Khoja May 04 '16 at 22:23
  • 7
    People need a tl;dr: Use -march if you ONLY run it on your processor, use -mtune if you want it safe for other processors. – j riv Feb 18 '17 at 05:50
  • 6
    Users must also understand that older compilers (released before some CPU did not exist) may result in different optimal `mtune` and `march` combination. This blog post illuminates that point with the others: https://lemire.me/blog/2018/07/25/it-is-more-complicated-than-i-thought-mtune-march-in-gcc/ – qneill Oct 16 '18 at 23:59
57

This is what i've googled up:

The -march=X option takes a CPU name X and allows GCC to generate code that uses all features of X. GCC manual explains exactly which CPU names mean which CPU families and features.

Because features are usually added, but not removed, a binary built with -march=X will run on CPU X, has a good chance to run on CPUs newer than X, but it will almost assuredly not run on anything older than X. Certain instruction sets (3DNow!, i guess?) may be specific to a particular CPU vendor, making use of these will probably get you binaries that don't run on competing CPUs, newer or otherwise.

The -mtune=Y option tunes the generated code to run faster on Y than on other CPUs it might run on. -march=X implies -mtune=X. -mtune=Y will not override -march=X, so, for example, it probably makes no sense to -march=core2 and -mtune=i686 - your code will not run on anything older than core2 anyway, because of -march=core2, so why on Earth would you want to optimize for something older (less featureful) than core2? -march=core2 -mtune=haswell makes more sense: don't use any features beyond what core2 provides (which is still a lot more than what -march=i686 gives you!), but do optimize code for much newer haswell CPUs, not for core2.

There's also -mtune=generic. generic makes GCC produce code that runs best on current CPUs (meaning of generic changes from one version of GCC to another). There are rumors on Gentoo forums that -march=X -mtune=generic produces code that runs faster on X than code produced by -march=X -mtune=X does (or just -march=X, as -mtune=X is implied). No idea if this is true or not.

Generally, unless you know exactly what you need, it seems that the best course is to specify -march=<oldest CPU you want to run on> and -mtune=generic (-mtune=generic is here to counter the implicit -mtune=<oldest CPU you want to run on>, because you probably don't want to optimize for the oldest CPU). Or just -march=native, if you ever going to run only on the same machine you build on.

LRN
  • 1,803
  • 15
  • 14
  • 4
    But if you use `-march=native`, you may want to specify `-mtune=X`, because the default is still `-mtune=generic`, as discussed here: https://lemire.me/blog/2018/07/25/it-is-more-complicated-than-i-thought-mtune-march-in-gcc/ – Roland Weber May 27 '19 at 07:54
  • @RolandWeber: That only happens if you use a GCC too old to know about your CPU. `-march=native` implies `tune=native` just fine if you use a GCC that knows about your CPU. That article only presents the bad case. Newer GCC versions make better code in general, especially when using new instructions like AVX2 and AVX-512. And having tuning settings (like loop unroll heuristics) designed for your CPU is a definite plus. So if you care enough about performance to be using these options, use a new GCC, at *least* one that knows about your CPU, preferably the current stable relese. – Peter Cordes Jul 24 '20 at 14:27
  • It does suck that GCC can't do any better than `tune=generic` for a newer member of the same microarchitecture family, especially something like Kaby Lake which is literally identical to Skylake microarchitecturally. But I think it still has a different family/stepping so a GCC that only knew about Skylake and older could fail to recognize it for tuning. – Peter Cordes Jul 24 '20 at 14:31
  • Shouldn't `-march=native` be fine if you use it for all cpus coming after yours from the same vendor? – ZeroPhase Apr 07 '21 at 06:31
  • @ZeroPhase: Usually yes, CPU vendors *normally* make their CPUs backwards compatible with previous models, not removing previously supported instructions. That isn't always the case, though: AMD supported the XOP SIMD extension in Bulldozer-family only, not Zen. Intel supported AVX-512 in Ice Lake / Tiger Lake chips, but removed it again in Alder Lake. Longer ago, AMD supported 3dNow (FP SIMD in 64-bit MMX registers), but dropped it once SSE became widespread (new 128-bit registers). And that's only in x86, where backwards compat drove commercial success (before CPUID feature detection) – Peter Cordes Jul 27 '22 at 18:16
  • @ZeroPhase: Also, `-march=native` implies `-mtune=native`. *Usually* vendors make new CPUs able to run efficiently with machine code that was tuned for earlier generations, at least without major slowdowns. Sometimes there are gains to be had, e.g. from different amounts of unrolling being good or not, or different instruction choices. Pentium 4 famously tried to *not* do this, hoping that everyone would recompile for P4 to avoid instructions like `inc` which were sub-optimal on P4. It didn't go great (mostly for other reasons), and later CPUs mostly had less penalty. Silvermont still some – Peter Cordes Jul 27 '22 at 18:20