1

I am developing an program that involves a lot of low latency hard-real time matrix operations. I am using Eigen 3 library for the same.
I wish to use AVX-512F SIMD vectorization in production for performance acceleration.
I am currently experimenting on Ubuntu and have used vcpkg package manager to install Eigen3 library. Currently my computer supports AVX2 and I will enable AVX-512F from BIOS later.
I am using

objdump -d main.o | grep zmm
objdump -d main.o | grep ymm
objdump -d main.o | grep xmm

command to analyze which registers are being used.
objdump -d main.o | grep zmm returns empty output.
objdump -d main.o | grep ymm returns empty outout.
objdump -d main.o | grep xmm returns register access and operation instruction.

I wish to know is my g++ compiler generating AVX1 (AVX-128) code for Eigen3 library.
How do I verify if proper AVX2 or AVX-512F SIMD code is being generated.

Small Snippet of objdump

3d84:   0f 28 08                movaps (%rax),%xmm1
3d87:   0f 29 4d d0             movaps %xmm1,-0x30(%rbp)
3d8b:   0f 29 45 e0             movaps %xmm0,-0x20(%rbp)
3d8f:   0f 28 45 d0             movaps -0x30(%rbp),%xmm0
3d93:   0f 12 45 e8             movlps -0x18(%rbp),%xmm0
3d9f:   0f 28 08                movaps (%rax),%xmm1
3da2:   0f 29 4d b0             movaps %xmm1,-0x50(%rbp)
3da6:   0f 29 45 c0             movaps %xmm0,-0x40(%rbp)
3daa:   0f 28 45 b0             movaps -0x50(%rbp),%xmm0
3dae:   0f 58 45 c0             addps  -0x40(%rbp),%xmm0
3db2:   0f 29 45 80             movaps %xmm0,-0x80(%rbp)
3db6:   0f 28 45 80             movaps -0x80(%rbp),%xmm0
3dba:   0f 28 4d 80             movaps -0x80(%rbp),%xmm1
3dbe:   0f c6 c1 01             shufps $0x1,%xmm1,%xmm0

Update

Code

#include <eigen3/Eigen/Core>
#include <eigen3/Eigen/Dense>
using namespace Eigen;
int main()
{
//  Matrix4f a, b, cadd, cmul, ci, ct, d;
//  a = Matrix4f::Random();
//  b = Matrix4f::Random();
    MatrixXf a(100, 100),b(100, 100),cadd(100, 100), cmul(100, 100), ci(100, 100), ct(100, 100);
    a = MatrixXf::Random(100, 100);
    b = MatrixXf::Random(100, 100);
    cadd = a + b;
    cmul = a * b;
    ci = cadd.inverse();
    ct = cadd.transpose();
}

Build Command

g++ -Wall -fexceptions -I/home/user/vcpkg/installed/x64-linux/include -c /home/user/Desktop/VectorClass/main.cpp -o obj/Debug/main.o
g++ -L/home/user/vcpkg/installed/x64-linux/debug/lib -o bin/Debug/VectorClass obj/Debug/main.o  -mavx2 -mtune=native -host=native -march=native

Final Object Dump

objdump -d main.o | grep ymm

14f:    c5 fe 7f 45 98           vmovdqu %ymm0,-0x68(%rbp)
231:    c5 fd 7f 85 50 fe ff     vmovdqa %ymm0,-0x1b0(%rbp)

objdump -d main.o | grep zmm

 a1:    62 f1 7c 48 28 74 24     vmovaps 0x80(%rsp),%zmm6
 ac:    62 f1 4c 48 58 44 24     vaddps 0x40(%rsp),%zmm6,%zmm0
 b4:    62 f1 7c 48 29 44 24     vmovaps %zmm0,0xc0(%rsp)
228:    62 f1 7c 48 28 7c 24     vmovaps 0x180(%rsp),%zmm7
230:    62 f1 7c 48 29 7c 24     vmovaps %zmm7,0x100(%rsp)
2d0:    62 f1 7d 48 6f 05 00     vmovdqa32 0x0(%rip),%zmm0        # 2da <main+0x2da>
2dd:    62 f2 7d 48 16 44 24     vpermps 0xc0(%rsp),%zmm0,%zmm0
2e5:    62 f1 7c 48 29 44 24     vmovaps %zmm0,0x180(%rsp)

Build Command

g++ -Wall -fexceptions -I/home/user/vcpkg/installed/x64-linux/include -c main.cpp -o main.out -mavx512f -mfma -mtune=native -host=native  -march=native -mprefer-vector-width=512 -O3 -fno-math-errno -ffinite-math-only -fno-rounding-math  -funsafe-math-optimizations
Dark Sorrow
  • 1,681
  • 14
  • 37
  • How did you compile your code? Did you explicitly specify g++ to use AVX (like `-mavx512f`)? – Urwald Nov 30 '22 at 06:31
  • 1
    That's SSE1 code. Note the lack of a `v` in front of the mnemonics; AVX and AVX512 would be `vmovaps`. It also looks like a debug build, storing/reloading everything to the stack, probably between intrinsics. I assume you ran `g++ foo.cpp` with no options, rather than `g++ -O3 -march=native -mprefer-vector-width=512 -ffast-math foo.cpp` or anything like that. – Peter Cordes Nov 30 '22 at 06:37
  • @Urwald I have added my code and build command. – Dark Sorrow Nov 30 '22 at 06:38
  • One think you should look at is that you are linking with flags for native, mavx2, ... (2nd g++ command), where you should use those flags when compiling (1st build command). – Luka Rahne Nov 30 '22 at 06:51
  • @LukaRahne, I couldn't understand you. Are you saying I should use ` -mavx2 -mtune=native -host=native -march=native` in my first `g++` command where I am actually compiling my code? – Dark Sorrow Nov 30 '22 at 06:55
  • 2
    @DarkSorrow Yes. First command that compiles is where binary code is actually generated, where linking phase (2nd command) is where objects files (aka generated code) is put/linked together in final binary. – Luka Rahne Nov 30 '22 at 07:07
  • @LukaRahne, I tried moving all options to compilation command but still its not generating AVX instruction. – Dark Sorrow Nov 30 '22 at 07:30
  • 1
    I am sorry for that. I suspect that issue is that you are not linking against libraries that has avx insctions. Your code does not actually producing any special instructions, but you should have suppor of such from library you are linking against. You can start https://eigen.tuxfamily.org/dox/TopicUsingBlasLapack.html and figure out what library that has support for avx you should link against. – Luka Rahne Nov 30 '22 at 08:19
  • 2
    Eigen specific: You can print the output of `Eigen::SimdInstructionSetsInUse()` -- that only shows which architectures Eigen was allowed to use though, not what it actually used. – chtz Nov 30 '22 at 09:06
  • @LukaRahne; I had to use the following `-mavx512f -mfma -mtune=native -host=native -march=native -mprefer-vector-width=512 -O3 -fno-math-errno -ffinite-math-only -fno-rounding-math -funsafe-math-optimizations` command to generate AVX instructions. – Dark Sorrow Nov 30 '22 at 11:11
  • @chtz; When I run `Eigen::SimdInstructionSetsInUse()` function I am getting the following output `AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2` on Ubuntu. When I run the same on Compiler Explorer I get the following output `AVX512, FMA, AVX2, AVX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2`. I checked the instruction set supported by my local CPU in /proc/cpuinfo I can see AVX2 support available. Why is `Eigen::SimdInstructionSetsInUse()` not detecting AVX2 in my local build? When I build program AVX2 and AVX215f instructions seem to be generated. I have updated code snippet in my answer. – Dark Sorrow Nov 30 '22 at 11:24
  • @LukaRahne: Eigen is a C++ template "library" that's actually just headers. You need to enable optimization (and ISA extensions) when compiling, because the real work happens in compiler-generated code, not pre-compiled binary libraries. (`-O3 -march=native` makes no difference when linking, unless you're using `-flto` to do cross-file inlining / optimization) – Peter Cordes Nov 30 '22 at 11:28
  • @DarkSorrow: BTW, in the earlier part of the question, I didn't notice that `-mavx2 -march=native` was part of your link options (but still no `-O3`), just that you had un-optimized asm and no options in the compile step where they'd matter. IDK if there's a more specific duplicate that mentions where they should go, as well as the fact that you need them. (Answer: on all steps, in case you want to use `-flto` which does more optimization at link time and thus code-gen.) – Peter Cordes Nov 30 '22 at 12:26
  • I don't know Eigen well enough to guess why it might not be printing support for AVX2 or AVX512. Possibly the AVX512F instructions are coming from auto-vectorization, not intrinsics, if you don't find many, or if profiling shows they're not in most of the true hot-spots? – Peter Cordes Nov 30 '22 at 12:28
  • 1
    Options like `-mavx512f` and `-mfma` are redundant with `-march=native`; it implies everything your CPU supports. It also implies `-mtune=native`, and `-host=native` isn't mentioned in the GCC manual. GCC on my system does accept it, but IDK what it does. – Peter Cordes Nov 30 '22 at 12:30
  • 2
    @DarkSorrow It looks like the function does not output FMA nor AVX2 unless AVX512 is active. Feel free to file an issue or provide a patch. – chtz Nov 30 '22 at 13:58
  • @PeterCordes; If I add `-flto` option no SIMD instructions are being generated. – Dark Sorrow Dec 01 '22 at 05:51
  • @DarkSorrow: Then most likely you used it wrong, perhaps omitting optimization flags from compiling or linking. With `-flto`, you need `-O3 -march=native -ffast-math` (or whatever) to be enabled during both `-c` compiling and linking the final binary. The C preprocessor only runs during the first step, so `-m` options there will determine what `#ifdef` blocks will be included for Eigen's intrinsics selection. – Peter Cordes Dec 01 '22 at 05:57
  • @chtz: Compiling the OP's source on my machine with `g++ -O3 -march=skylake eigentest.cpp -o test` (thus AVX2+FMA, but not AVX-512), I do get FMA instructions like `vfmadd231ps ymm1,ymm12,YMMWORD PTR [rdx]` in the binary. Also an AVX2 `vbroadcastss ymm0,xmm1` (AVX1 only provides memory-source broadcasts). I have Eigen 3.4 and g++ 12.2.0 on Arch Linux. So if there was an issue, it's already been fixed in current Eigen. Unless you mean in what `Eigen::SimdInstructionSetsInUse()` prints; yeah that only goes up to AVX for me, unless I use `-march=skylake-avx512` which of course crashes for me. – Peter Cordes Dec 01 '22 at 06:09
  • @PeterCordes Yes, I was only referring to `Eigen::SimdInstructionSetsInUse()` – chtz Dec 01 '22 at 09:28

0 Answers0