I am in a confusion that which is better . I am aware of writing the code in both but I am not getting which is better in general for any processor .Please tell me the reason also for the same .
-
1In general start with intrinsics and then only go to asm if you need further optimisation. For x86, PowerPC, et al, this is rarely necessary, but compilers for ARM/Neon are not so good, and you may well have to resort to assembly if your code is sufficiently performance-critical. – Paul R Feb 02 '18 at 10:33
-
The question doesn't mention what the circumstances are. There are some relatively rare cases where assembly code is faster, such as this 400+ line [assembly code](https://github.com/01org/isa-l/blob/master/crc/crc16_t10dif_01.asm) for a fast crc16 or crc32 using X86 pclmulqdq instruction (carryless multiply). – rcgldr Feb 03 '18 at 07:53
1 Answers
As Paul said in comments your needs and expectancy would indicate:
In general start with intrinsics and then only go to asm if you need further optimisation. For x86, PowerPC, et al, this is rarely necessary, but compilers for ARM/Neon are not so good, and you may well have to resort to assembly if your code is sufficiently performance-critical
Intrinsics are apart of most compilers and you can use them to comply your performance requirements. Intrinsics are simpler than inline assembly or pure assembly. If you are going to use high-level languages such as C or C++ I suggest not to use inline assembly. In my experience ICC, GCC and Clang could not optimize inline assembly or if optimize it would be tiny. Intrinsics are good when you want to code for a specific architecture like x86, and recompile it for different micro-architecture. As Peter said in comments:
to be able to re-compile your code with different
-mtune=haswell
or-mtune=znver1
options
Intrinsics also are a challenge for optimizer but not like inline assembly. For example, If you compile codes written in C and Intrinsics, your performance might not be different, however, you enable compilers optimizations. In my tests, mostly O3
, autovector disabled, and O2
gain the same performance for Intrinsics while the same approach in scalar codes show completly differenct performance (not auto vectorized, but other optimizations). In this paper you can see an evaluation for inline assembly and Intrinsic functions for matrix-matrix multiplication. In addition, Intrinsic functions are not portable, need future maintenance, etc. I have found a new interface that claim it doesn't lose performance compared to Intrinsics.

- 2,417
- 1
- 15
- 42
-
5*Intrinsics are good when you want to code for a specific micro-architecture* You have that backwards. Intrinsics are good if you want to be able to re-compile your code with different `-mtune=haswell` or `-mtune=znver1` options. Hand-written asm has more advantages if you're tuning for a specific uarch instead of trying to be generic. – Peter Cordes Feb 02 '18 at 12:24