In MSVC there exist instrinsics __emulu() and _umul128(). First does u32*u32->u64
multiplication and second u64*u64->u128
multiplication.
Do same intrinsics exist for CLang/GCC?
Closest I found are _mulx_u32()
and _mulx_u64()
mentioned in Intel's Guide. But they produce mulx
instruction which needs BMI2 support. While MSVC's intrinsics produce regular mul
instruction. Also _mulx_u32()
is not available in -m64
mode, while __emulu()
and _umul128()
both exist in 32 and 64 bit mode of MSVC.
You may try online 32-bit code and 64-bit code.
Of cause for 32-bit one may do return uint64_t(a) * uint64_t(b);
(see it online) hoping that compiler will guess correctly and optimize to using u32*u32->u64
multiplication instead of u64*u64->u64
. But is there a way to be sure about this? Not to rely on compiler's guess that both arguments are 32-bit (i.e. higher part of uint64_t is zeroed)? To have some intrinsics like __emulu()
that make you sure about code.
There is __int128
in GCC/CLang (see code online) but again we have to rely on compiler's guess that we actually multiply 64-bit numbers (i.e. higher part of int128 is zeroed). Is there a way to be sure without compiler guessing, if there exist some intrinsics for that?
BTW, both uint64_t
(for 32-bit) and __int128
(for 64-bit) produce correct mul
instruction instead of mulx
in GCC/CLang. But again we have to rely that compiler guesses correctly that higher part of uint64_t
and __int128
is zeroed.
Of cause I can look into assembler code that GCC/Clang have optimized and guessed correctly, but looking at assembler once doesn't guarantee that same will happen always in all circumstances. And I don't know of a way in C++ to statically assert that compiler did correct guess about assembler instructions.