I'm after the fastest 256 bit integer library (which isn't a nightmare to integrate).
As part of this I'm trying to get a rough idea of the performance comparison between Clang's _Bitint(256)
and Boost multiprecision's int256_t
.
I've currently got this for Clang's _BitInt(256)
:
#include <cstdint>
#include <iostream>
using int256_t = signed _BitInt(256);
int main()
{
for(int i = 0; i < 200; ++i)
{
// Using __rdtsc() for something non-deterministic
const int256_t a = __rdtsc() * __rdtsc() * __rdtsc() * __rdtsc() * __rdtsc() * __rdtsc();
const int256_t b = __rdtsc() * __rdtsc() * __rdtsc();
const uint64_t start = __rdtsc();
const int256_t c = a / b;
const uint64_t finish = __rdtsc();
std::cout << finish - start << " " << static_cast<int64_t>(c) << std::endl;
}
}
https://godbolt.org/z/9M9TG16ax
but it looks like the divide is getting completely optimized-out? I've tried to use some randomness in the 256 bit division using __rdtsc()
. I usually print the calculated value to prevent dead code elimination, but ostream
isn't supported for bitint(256)
so I had to do a hacky static_cast.
Could anyone suggest how I could profile this?
Or if there's any faster, header-only 256 bit integer library?