3

It is well known that built-in C++ types uint32_t, int32_t, uint64_t, int64_t and even GCC/CLang built-in types __int128 and unsigned __int128, all have sizeof exactly equal to bit-width divided by 8.

But if you print sizeof of Boost's boost::multiprecision::uint256_t or uint512_t then you get 48 bytes instead of 32 bytes for uint256_t, and 80 bytes instead of 64 bytes for uint512_t. Both types have exactly 16 bytes more sizeof than expected. See demo here.

But sizeof of boost::multiprecision::uint128_t gives exactly 16 bytes as expected.

It appeared that base class of all integers of Boost cpp_int_base has several fields:

data_type m_data;
unsigned  m_limbs;
bool      m_sign, m_internal, m_alias;

Only m_data field contains bits of integer, while other fields give this extra 16 unnecessary bytes of sizeof.

My question is it possible to tweak Boost multiprecision integer somehow so that it contains only data bits and nothing else?

In other words it keeps sign (if it is signed integer) same as Intel CPU keeps it inside int64_t, basically highest bit is sign bit, and the rest of bits is encoded sign-complimentary form. So that Boost integer is encoded samely as default's intel uint64_t, int64_t.

If you look into signature of template cpp_int_base then you see:

template <unsigned MinBits, unsigned MaxBits, cpp_integer_type SignType,
    cpp_int_check_type Checked, class Allocator, bool trivial = false>
struct cpp_int_base;

Apparantely trivial seems to Almost do what is necessary, specialization of cpp_int_base with trivial = true contains just two fields:

local_limb_type m_data;
bool            m_sign;

So just 1 byte bigger than minimal possible sizeof.

More than that 128-bit integer has trivial = true by default, while bigger integers have trivial = false.

But there is no way to control trivial template parameter, because if you look at uint256_t definition then you see:

using uint256_t = number<cpp_int_backend<256, 256,
    unsigned_magnitude, unchecked, void> >  ;

and cpp_int_backend has no trivial parameter among its template params, only cpp_int_base has this trivial inside. But cpp_int_base is not accessibly to user, it is internal detail of library.

Also I don't know how 128-bit integer has exactly 16 bytes of sizeof, because as I showed above even trivial param has extra bool m_sign; field, which should give extra 1 byte (i.e. 17 sizeof). But somehow 128-bit integer is not 17 bytes as expected, but 16 bytes in size.

Why I need boost integer to have exactly minimal amount of bits. Because in my program I have millions of integers in array. And besides usual math I do my own special math operations over these integers. My math operations work on usual Intel form of integers, same as int64_t and uint64_t are represented. But sometimes I need regular operations like + - * / % ^ | ~, and not to implement them I decided to use Boost multiprecision library.

If Boost had exactly same representation as Intel I would just do reinterpret_cast<boost::multiprecision::uint256_t &>(array[i]) *= 12345;, without any intermediate conversion or memcpy. But as Boost has a different format I have to write custom conversion back and forth.

If I do e.g. ^ operation for example then it takes 1-4 CPU cycles for 256-bit integer. And if I do conversion to/from Boost format then it would take 5-10 cycles more, which is a really big overhead.

Thus this trivial Intel format of Boost integers is needed for me as optimization, not to do conversion on every single operation.

One more less important reason is that somewhere in my templated code I need to figure out what bit-width of a number of templated type T was given. If this is always a trivial Intel format then sizeof(T) * 8 will give exact bit width of a number. While for Boost format I need to specialize some helper templated structure like BitWidthOf<T>::value.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Arty
  • 14,883
  • 6
  • 36
  • 69
  • 1
    *same as Intel CPU keeps it inside int64_t,* - That's called 2's complement, and *all* modern ISAs use it, not just x86-64. The fixed-width no-padding types like `int64_t` are required to be 2's complement if they exist at all, but primitive types like `long long` and `int_fast64_t` can be 1's complement or sign/magnitude in C++17 and earlier. C++20 standardized 2's complement for all primitive signed integral types. [Why not enforce 2's complement in C++?](https://stackoverflow.com/q/5654159) (It does, now.) – Peter Cordes Mar 04 '23 at 19:03
  • 1
    *I would just do `reinterpret_cast` ...* Looks like strict-aliasing UB, unless of course you want to compile with `gcc -fno-strict-aliasing`. Different types can be assumed not to alias, except for some special cases like structs with a common initial sequence or something, IIRC. But a struct / class and an array are definitely distinct. If you had a 256-bit type without any padding, constructing one from an array should just optimize away, letting it compile to `vpxor ymm0, ymm0, [rdi]` or something like you're talking about for 256-bit XOR on x86-64 with AVX2. – Peter Cordes Mar 04 '23 at 19:08
  • As you mentioned in your answer, `bit_cast` would work for any POD types with compatible object-representation. – Peter Cordes Mar 04 '23 at 21:55

1 Answers1

6

From the docs:

  • When used at fixed precision, the size of this type is always one machine word (plus any compiler-applied alignment padding) larger than you would expect for an N-bit integer: the extra word stores both the sign, and how many machine words in the integer are actually in use. The latter is an optimisation for larger fixed precision integers, so that a 1024-bit integer has almost the same performance characteristics as a 128-bit integer, rather than being 4 times slower for addition and 16 times slower for multiplication (assuming the values involved would always fit in 128 bits). Typically this means you can use an integer type wide enough for the "worst case scenario" with only minor performance degradation even if most of the time the arithmetic could in fact be done with a narrower type. Also note that unsigned fixed precision types small enough to fit inside the largest native integer become a simple wrapper around that type, this includes the "checked" variants. Small signed types will always have an extra sign word and so be larger than their native equivalent.

So it's an "optimization", and I don't see a way to turn it off. I would look for alternative libraries to do what you want (allowing bit_cast from some other format).

What "trivial" does is use a built-in type if it exists, otherwise it switches to the raw data back-end (which has this extra information about used words, taking an extra word to store).

Also since a native (unsigned) __int128 available, boost::multiprecision::uint128_t can be a simple wrapper around unsigned __int128 (being 16 bytes), but boost::multiprecision::int128_t must hold a unsigned __int128 m_data and a bool m_sign for sign-magnitude representation.


For your last question, you get the bit-width of a type with std::numeric_limits<T>::digits, which Boost does specialize for:

template<typename T>
inline constexpr int bit_width = []{
    using lim = std::numeric_limits<T>;
    static_assert(lim::is_specialized && lim::is_integer && lim::is_bounded && lim::radix == 2);
    return lim::digits;
}();

static_assert(bit_width<std::uint16_t> == 16);
static_assert(bit_width<std::uint32_t> == 32);
static_assert(bit_width<boost::multiprecision::uint128_t> == 128);
static_assert(bit_width<boost::multiprecision::uint512_t> == 512);
static_assert(bit_width<boost::multiprecision::int512_t> == 512);
Artyer
  • 31,034
  • 3
  • 47
  • 75
  • `cpp_int_backend` is arbitrary-precision and as such the "optimization" makes a lot of sense. As is its allocator argument. It's only when one expects a naive implementation. Which arguably makes sense up to 128bit on some systems. – sehe Mar 04 '23 at 21:58
  • @sehe It might make sense for larger types, and is outright needed for variable-width and arbitrary precision types, but it's unlikely to make sense for uint256/uint512. Here is the assembly output for a native uint512 vs boost's uint512: https://godbolt.org/z/K3G4a61e9 and there's many more jumps in the boost case, which it has to do because the "size" isn't known statically – Artyer Mar 04 '23 at 22:34
  • Nice demo! I had no idea about `_BitInt` either, so I learned some things. Perhaps code folding might end up being beneficial here, but in isolation that doesn't look great. – sehe Mar 04 '23 at 23:37