1

As per the title,

Will programs compiled with the intel compiler under

icc -O3 -xCORE-AVX2 program.cpp

Generate AVX512 instructions on a Xeon Gold 61XX?

Our assembler analysis doesn't seem to find one, but that is no guarantee.

Thanks!

Gaston
  • 537
  • 4
  • 10

1 Answers1

2

In ICC classic, no, you can use intrinsics for any instruction without telling the compiler to enable it. (Unlike GCC or clang where you have to enable instruction sets to use their intrinsics, like the LLVM-based Intel OneAPI compiler.)

But the compiler won't emit AVX-512 instructions other than from intrinsics (or inline asm), without enabling a -march=skylake-avx512 or -march=native (aka -xHOST) or similar option that implies -mavx512f. Or a pragma or __attribute__((target("string"))) to enable AVX-512 for a single function.

This is true for all the major x86 compilers, AVX-512 is not on by default.

Use -O3 -march=native if you want to make code optimized for the machine you're running on, just like with GCC or clang.


In ICC classic, you can also let the compiler use certain instruction-sets on a per-function basis, with _allow_cpu_features(_FEATURE_AVX512F|_FEATURE_BMI); which works more like a pragma, affecting compile-time code-gen. See the docs.

Also related: The Effect of Architecture When Using SSE / AVX Intrinisics re: gcc/clang vs. MSVC vs. ICC.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • O.K., but is there some way to tell the intel compiler that the application will be run in all the cores at the same time? As that appears to impose constraints on resources that "could" be taken into account during compilation. – Gaston Jul 06 '22 at 08:59
  • 1
    @Gaston: There are no heterogeneous x86 systems yet, as far as ISA extensions are concerned. Alder Lake systems with any E cores enabled forcibly disable AVX-512 on the P-cores, even if you're using a CPU / BIOS / microcode version that would allow AVX-512 on the P-cores if the E-cores were disabled. If you mean in terms of *tuning* options (`-march=native` but then something like a hypothetical `-mtune=alder-lake-e`?) I don't know if any compilers support any thread-pinning-aware tuning settings. – Peter Cordes Jul 06 '22 at 09:04
  • Great answer. For SkyLake 61XX, specifically for our case 6126, there are 12 'identical real' cores (P-cores in 12th gen notation?). Our software is run on 24 threads, each associated to one of 24 MPI processes. `-O3 -march=native` emits AVX512 instructions, but could pair of threads end up competing for vectorization usage? This information could be hypothetically used during compilation. Any recommended source for tuning this particular case? – Gaston Jul 06 '22 at 09:19
  • 1
    @Gaston: There's not really anything you'd do differently when tuning for threads sharing a physical core via hyper-threading. Compilers aren't good enough at unrolling to hide FP latency in the first place to have options to *not* do it as much on the assumption that it'll be sharing ALU throughput with another core, and that's about the only thing you might do differently. – Peter Cordes Jul 06 '22 at 09:24