Clang had _ExtInt
extended integers that supports operations other than division, but SIMD isn't useful for that because of carry between elements1. Other mainstream x86-64 compilers don't even have that; you need a library or something to define a custom type and use the same add-with-carry instructions clang will use. (Or a less efficient emulation in pure C2).
This has now been renamed _BitInt(n)
, and will be part of ISO C23. (clang -std=gnu2x
).
As an extension, clang also accepts _BitInt
in C++, regardless of revision, even with -std=c++11
rather than -std=gnu++11
. Also in earlier C revisions, like -std=gnu11
or -std=c11
.
typedef unsigned _BitInt(256) u256;
typedef /*signed*/ _BitInt(256) i256;
// works with clang 16
// but not GCC yet (as of April 2023)
int lt0(i256 a){
return a < 0; // just the sign bit in the top limb
}
// -march=broadwell allows mulx which clang uses, and adox/adcx which it doesn't
u256 mul256(u256 a, u256 b){
return a*b;
}
Godbolt with clang -std=gnu2x
- works even with -m32
, where it's 8x 32-bit limbs instead of just 4x 64-bit. Multiply and divide expand inline to a large amount of code, not calling helper functions, so use carefully.
- Clang 11 and 12 supported
_ExtInt(256)
, except for divide wider than 128. Not _BitInt
.
a<0
required explicit casts like a < (i256)0
.
- Clang 13 added implicit conversion from
int
to _ExtInt
types. Still no divide support for integers wider than 128-bit.
- Clang 14 and 15 support
_BitInt(n)
, but only for sizes up to _BitInt(128)
, so all supported sizes support division.
- Clang 16 and later accept
unsigned _BitInt(256) bar;
, including mul and div (but it's expanded inline, not a helper function, so code-size is large for those ops.)
- GCC 12 and pre-13 trunk don't yet support
_BitInt
at all.
SIMD 256-bit vectors aren't 256-bit scalar integers
__m256i
is AVX2 SIMD 4x uint64_t
(or a narrower element size like 8x uint32_t
). It's not a 256-bit scalar integer type, you can't use it for scalar operations, __m256i var = 1
won't even compile. There is no x86 SIMD support for integers wider than 64-bit, and the Intel intrinsic types like __m128i
and __m256i
are purely for SIMD. You can do bitwise boolean ops with them.
GCC's __int128
/ unsigned __int128
typically uses scalar add/adc
, and/or scalar mul
/ imul
, because AVX2 is generally not helpful for extended precision, unless you use a partial-word storage format so you can defer carry. (SIMD is helpful for stuff like bitwise AND/OR/XOR where element boundaries are irrelevant.)
Footnote 1: There actually is some scope for using SIMD for BigInteger types, but only with a specialized format. And more importantly, you have to manually choose when to re-normalize (propagate carry) so your calculations have to be designed around it; it's not a drop-in replacement. See Mysticial's answer on Can long integer routines benefit from SSE?
Footnote 2: Unfortunately C does not provide carry-out from addition / subtraction, so it's not even convenient to write in C. sum = a+b
/ carry = sum<a
works for carry out when there's no carry in, but it's much harder to write a full adder in C. And compiler typically make crap asm that doesn't just use native add-with-carry instructions on machines where they're available. Extended-precision libraries for very big integers, like GMP, are typically written in asm.