1

We'd like to deploy our product on two different HW platforms, an i5 (typically i5-7500, but older CPUs back to 4100 must be supported) and an Atom (E3845)

Supporting Atom is new. Running the current binaries on the E3845 don't work - "Illegal instruction". Disassembling in gdb doesn't show me exactly which instruction, it only says "(bad)".

Since both are x86 I'd like to deploy a single set of binaries but, other than exhaustive trial and error, I don't know how to find which combination gcc flags will generate code compatible with both CPUs.

P Brady's gcccpuopt.sh script looked promising but it doesn't support my CPUs

Looking at /proc/cpuinfo here's the difference:

CPU      Atom E3845    i3-4160      
Family   6             6
Model    55            60      
         3dnowprefetch          
         epb                    IA32_ENERGY_PERF_BIAS support
                       abm      Advanced Bit Manipulation
                       avx      Advanced Vector Extensions
                       avx2     Advanced Vector Extensions
                       bmi1     Bit Manipulation Instructions
                       bmi2     Bit Manipulation Instructions
                       eagerfpu ???
                       f16c     16-bit floating point conversions
                       fma      4 operands MAC instructions for fused multiply–add
                       fsgsbase ????
                       invpcid  Invalidate Processor Context ID
                       pcid     Process Context Identifiers
                       pdpe1gb  One GB pages (allows hugepagesz=1G)
                       pln      Intel Power Limit Notification
                       pts      Intel Package Thermal Status
                       xsave    Save Processor Extended States: also provides XGETBY,XRSTOR,XSETBY
                       xsaveopt Optimized XSAVE

I don't really know what all of those mean... Would I just disable (if possible) generation of everything in the i5 column? Or is there a better procedure for finding the settings?

Target environment is 32-bit Centos6 with 3.10 kernel. GCC 4.9. Code is mostly C++ with some C.

Danny
  • 2,482
  • 3
  • 34
  • 48
  • since gcc-4.5 `-march=atom` works (enables MMX, SSE, SSE2, SSE3 and SSSE3 instruction set support) but note that some early "atom" CPUs were 32bit only. see also https://stackoverflow.com/questions/110674/gcc-optimization-flags-for-intel-atom... or if you compile it on the target atom itself, you can just use `-march=native` – technosaurus Feb 23 '18 at 03:45
  • Thanks for that. If I compile with `-march=atom` would anything break when running on the i5/i7? Seems only `epb` is unique to atom (?) – Danny Feb 23 '18 at 04:08
  • -march=atom doesn't report to enable epb code generation. Epb is used to allow software to hint performance expectations to the cpu. I've never seen any c code + compiler intrinsic combinations that would invoke epb. If you want to use epb, you'll need some inline assembly. Keep in mind that gcc can compile multiple versions of functions optimized for different sub-architectures​(i think it is called ifunc??? ... Look for function attribute ifunc and target) – technosaurus Feb 23 '18 at 04:47

1 Answers1

1

In order to make this answer applicable to more use cases, I'll try to make this generic and use atom and i5 as the examples.

  1. On each platform run gcc -march=native -Q --help=target as noted here

  2. Gather the options that are common to all platforms and either add them to your CFLAGS or make a wrapper that always adds them to your compiler command line (it could just be a shell script with /path/to/real-gcc $myflags $@ where $myflags is your list of common flags). I have often had to resort to the wrapper method for some stubborn build systems that ignore $CFLAGS.

  3. Compile as normal, ensuring that your CFLAGS get used.

  4. If performance is acceptable stop here, otherwise do a profile guided optimization build

  5. If performance is acceptable stop here, otherwise you can use your profile info to identify functions that may benefit from gcc's target_clones function attribute or a combination of ifunc and target function attributes (supported by clang) to generate sub-architecture specific versions of each function that get resolved at run time. (Note that in this specific case there may be no functions where this is useful, since the i5 outperforms the atom in most cases)

  6. If performance is acceptable stop here, otherwise fix the code.

technosaurus
  • 7,676
  • 1
  • 30
  • 52
  • Thanks. Looks very promising. The output of the command lists some options as 'enabled' and "disabled". When you say "gather the options" do I include both? For example, if the output says `-mpush-args [enabled] and -mrdrnd [disabled]` must I then have `CFLAGS = -mpush-args -mno-mrdrnd` ? And further, what is "common"? Only the enabled ones? – Danny Feb 23 '18 at 11:06
  • @Danny You can ignore anything that is also produced when `-march=native` is omitted... those would be default settings or those enabled by your optimization level (such as -O{1,2,3,s,z,g}) – technosaurus Feb 23 '18 at 14:34
  • Thanks. What about my original question about having to include both the positive and negative flags? ie, if the output says `-mpush-args [enabled] and -mrdrnd [disabled]` must I then have `CFLAGS = -mpush-args -mno-mrdrnd` ? – Danny Feb 25 '18 at 07:44
  • You should only need to add the flags that are enabled for all supported systems that are not enabled when `-march=native` is ommited. If in doubt leave them out. – technosaurus Feb 25 '18 at 08:37