Benchmarking method to compare performance between C and avx2 assembly?

Question

I want to know details about benchmarking. If I want to compare performance between C and avx2 hand-written assembly implementation.

Should I use -O3 compiler flag? But -O3 will optimize C and make as fast as avx2 maybe. Or should I use -O1?

score 2 · Answer 1 · answered Mar 27 '23 at 04:17

If you want to make your asm look unrealistically good, stop the compiler from optimizing as much as normal. If you want to see how much you could actually gain, compare against how you'd actually compile your C for production use.

It depends what you're trying to learn. If you're trying to compare vectorized asm to scalar asm, then gcc or clang -O3 -march=native -fno-tree-vectorize might be appropriate. At least -O2.

If you're trying to see if there's anything to be gained from hand-written intrinsics or something, then you shouldn't gimp the compiler. If it already does a good job making AVX2 asm from plain C source, then your plain C already is an AVX2 implementation. As long as you use that compiler and options, anyway.

Look at the compiler-generated asm (see *How to remove "noise" from GCC/clang assembly output?), see if you notice anything you could change that might make it faster. Try it out by hand to see if you're right. Often you can get the compiler to emit the asm you want. Perhaps by using intrinsics, but if you're lucky then you can get it to auto-vectorize pure C, so you get nice portable maintainable C but also the performance of AVX2 assembly. See Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? re: helping the compiler vs. beating it with asm it wouldn't emit.

BTW, the phrasing in this question is weird. C is a good language for writing AVX2 code with intrinsics. I assume you're talking about C vs. hand-written assembly like in a previous question you asked.

Benchmarking method to compare performance between C and avx2 assembly?

1 Answers1