31

Let's say that you are using <cstdint> and types like std::uint8_t and std::uint16_t, and want to do operations like += and *= on them. You'd like arithmetic on these numbers to wrap around modularly, like typical in C/C++. This ordinarily works, and you find experimentally works with std::uint8_t, std::uint32_t and std::uint64_t, but not std::uint16_t.

Specifically, multiplication with std::uint16_t sometimes fails spectacularly, with optimized builds producing all kinds of weird results. The reason? Undefined behavior due to signed integer overflow. The compiler is optimizing based upon the assumption that undefined behavior does not occur, and so starts pruning chunks of code from your program. The specific undefined behavior is the following:

std::uint16_t x = UINT16_C(0xFFFF);
x *= x;

The reason is C++'s promotion rules and the fact that you, like almost everyone else these days, are using a platform on which std::numeric_limits<int>::digits == 31. That is, int is 32-bit (digits counts bits but not the sign bit). x gets promoted to signed int, despite being unsigned, and 0xFFFF * 0xFFFF overflows for 32-bit signed arithmetic.

Demo of the general problem:

// Compile on a recent version of clang and run it:
// clang++ -std=c++11 -O3 -Wall -fsanitize=undefined stdint16.cpp -o stdint16

#include <cinttypes>
#include <cstdint>
#include <cstdio>

int main()
{
     std::uint8_t a =  UINT8_MAX; a *= a; // OK
    std::uint16_t b = UINT16_MAX; b *= b; // undefined!
    std::uint32_t c = UINT32_MAX; c *= c; // OK
    std::uint64_t d = UINT64_MAX; d *= d; // OK

    std::printf("%02" PRIX8 " %04" PRIX16 " %08" PRIX32 " %016" PRIX64 "\n",
        a, b, c, d);

    return 0;
}

You'll get a nice error:

main.cpp:11:55: runtime error: signed integer overflow: 65535 * 65535
    cannot be represented in type 'int'

The way to avoid this, of course, is to cast to at least unsigned int before multiplying. Only the exact case where the number of bits of the unsigned type exactly equals half the number of bits of int is problematic. Any smaller would result in the multiplication being unable to overflow, as with std::uint8_t; any larger would result in the type exactly mapping to one of the promotion ranks, as with std::uint64_t matching unsigned long or unsigned long long depending on platform.

But this really sucks: it requires knowing which type is problematic based upon the size of int on the current platform. Is there some better way by which undefined behavior with unsigned integer multiplication can be avoided without #if mazes?

Myria
  • 3,372
  • 1
  • 24
  • 42
  • 1
    I guess "always cast both to `unsigned long long`" isn't a great idea? – T.C. Jul 17 '14 at 05:44
  • 1
    More than half the bits of `int`, but not actually as large, would also be problematic (never seen an architecture on which such a type exists) – Ben Voigt Jul 17 '14 at 05:44
  • 2
    @TC: That sounds very inefficient. 64-bit multiplies can be slow. – Ben Voigt Jul 17 '14 at 05:45
  • @BenVoigt Not if they're compile-time constants. – Mysticial Jul 17 '14 at 05:46
  • @Mysticial: I'm pretty sure compile-time constants are used in the examples here because (1) they make the example short and self-contained, and (2) they trigger diagnosis of undefined behavior. Nothing in the question suggests to me that variables are excluded. – Ben Voigt Jul 17 '14 at 05:52
  • @BenVoigt This came up in a CPU emulator, so yes, it's not just about compile-time constants. =) – Myria Jul 17 '14 at 06:02
  • how about using bignum arithmetic? http://en.wikipedia.org/wiki/Multiplication_algorithm#Grid_method store the result of a multiplication in two variables if it would overflow? – Alexander Oh Jul 17 '14 at 06:20
  • 2
    A followup question: What rule/promotion makes `x *= x;` promote `uint16_t` to `int32_t`? I find some promotion rules in the Std, but can not map them to this, exactly. – towi Jul 17 '14 at 08:46
  • 1
    @towi: The rule for compound assignment says it performs the same computation as `x = x * x;` The rules for multiplication say that the usual integral promotions are performed. – Ben Voigt Jul 17 '14 at 13:40
  • 2
    @BenVoigt There is a convoluted path through the standard that says the multiplication of two unsigned shorts that result in an unsigned short produces undefined behavior for certain data values because of signed integer overflow. There are more direct statements in the standard that say arithmetic operations on identical unsigned data types producing the same type result cannot possibly overflow and the result is as if modular arithmetic and been used, which is what any reasonable programmer expects, myself included. The standard is in conflict with itself. – amdn Jul 17 '14 at 16:11
  • 3
    @amdn: `uint8_t(a) * uint8_t(b)` does not do arithmetic on unsigned types, so that clause controlling unsigned arithmetic doesn't apply. Unexpected, but true. – Ben Voigt Jul 17 '14 at 16:14
  • 2
    @BenVoigt If the standard says that "unsigned short b = 65535; b = b * b;" raises undefined behavior and therefore it is ok for the compiler to generate code that gets my girlfriend pregnant, then it has violated the principle of least astonishment: https://en.wikipedia.org/wiki/Principle_of_least_astonishment – amdn Jul 17 '14 at 16:20
  • @amdn: It might or might not, depending on the size of integral types on your platform. That's why people have been discussing `uint16_t` not `unsigned short`. I agree that it is unexpected. But consider `uint16_t a = 65535, b = 65534; int diff = b - a;`. Which result is more unexpected: `diff == 65535` or `diff == -1` ? – Ben Voigt Jul 17 '14 at 16:23
  • @BenVoigt I definitely would expect -1 in your example because the type of the result is signed. The standard says the purpose of usual arithmetic conversions is to yield a common type which is also the type of the result. No conversion is necessary if the type of the result is the same as the type of the operands, and the standard is clear that unsigned arithmetic `shall` obey the laws of arithmetic modulo 2^n where n is the number of bits in the value representation of that particular size of integer, and furthermore that it implies unsigned arithmetic does not overflow. – amdn Jul 17 '14 at 16:33
  • 5
    Something is rotten in the state of Denmark when you need complex solutions such as the answers to this question to add syntactic sugar to what should be a straightforward semantic interpretation of the standard. – amdn Jul 17 '14 at 16:56
  • 1
    On conventional hardware, unless a compiler that doesn't go out of its way to violate the POLA, an expression which uses additive, multiplicative, and bitwise operators on unsigned variables of a given type and stores the result to the same type would yield modular results. I suspect the reason the authors of the standard declined to write special rules to ensure modular results in that case because they didn't expect compiler writers to go out of their way to violate the POLA. I can't imagine non-contrived scenarios in which adding a special rule would impede useful optimizations... – supercat Jun 30 '15 at 16:24
  • ...but I suspect that requiring compilers to obey such a rule would for some reason be opposed as representing a bigger "change" than would requiring programmers to add silly `1u*` type coercions all over the place. – supercat Jun 30 '15 at 16:26

3 Answers3

9

Some template metaprogramming with SFINAE, perhaps.

#include <type_traits>

template <typename T, typename std::enable_if<std::is_unsigned<T>::value && (sizeof(T) <= sizeof(unsigned int)) , int>::type = 0>
T safe_multiply(T a, T b) {
    return (unsigned int)a * (unsigned int)b;
}

template <typename T, typename std::enable_if<std::is_unsigned<T>::value && (sizeof(T) > sizeof(unsigned int)) , int>::type = 0>
T safe_multiply(T a, T b) {
    return a * b;
}

Demo.

Edit: simpler:

template <typename T, typename std::enable_if<std::is_unsigned<T>::value, int>::type = 0>
T safe_multiply(T a, T b) {
    typedef typename std::make_unsigned<decltype(+a)>::type typ;
    return (typ)a * (typ)b;
}

Demo.

T.C.
  • 133,968
  • 17
  • 288
  • 421
  • Something I think might work is detecting via `std::numeric_limits` when `::max() > ::max() / ::max()` (with appropriate casts), since `max` is a `constexpr` function. If so, cast each parameter to `typename std::make_unsigned::type` instead of merely `unsigned int`. This should catch every case. (The unary `+` in the `make_unsigned` `decltype` is to determine the promoted type.) Unconditionally casting to the `make_unsigned` of the promoted type also works, and should be equally fast on sane platforms. – Myria Jul 17 '14 at 06:53
  • @Myria Good point about unconditional `make_unsigned` on the promoted type. Still needs to keep the `enable_if` so that this doesn't work if you pass signed types. – T.C. Jul 17 '14 at 07:14
8

Here's a relatively simple solution, which forces a promotion to unsigned int instead of int for unsigned type narrower than an int. I don't think any code is generated by promote, or at least no more code than the standard integer promotion; it will just force multiplication etc. to use unsigned ops instead of signed ones:

#include <type_traits>
// Promote to unsigned if standard arithmetic promotion loses unsignedness
template<typename integer> 
using promoted =
  typename std::conditional<std::numeric_limits<decltype(integer() + 0)>::is_signed,
                            unsigned,
                            integer>::type;

// function for template deduction
template<typename integer>
constexpr promoted<integer> promote(integer x) { return x; }

// Quick test
#include <cstdint>
#include <iostream>
#include <limits>
int main() {
  uint8_t i8 = std::numeric_limits<uint8_t>::max(); 
  uint16_t i16 = std::numeric_limits<uint16_t>::max(); 
  uint32_t i32 = std::numeric_limits<uint32_t>::max(); 
  uint64_t i64 = std::numeric_limits<uint64_t>::max();
  i8 *= promote(i8);
  i16 *= promote(i16);
  i32 *= promote(i32);
  i64 *= promote(i64);

  std::cout << " 8: " << static_cast<int>(i8) << std::endl
            << "16: " << i16 << std::endl
            << "32: " << i32 << std::endl
            << "64: " << i64 << std::endl;
  return 0;
}
rici
  • 234,347
  • 28
  • 237
  • 341
  • Cool, I like this. In my opinion, instead of "unsigned" as line 6, it should say `typename std::make_unsigned::type`. Also, to force promotion, you can also just use unary `+`; in other words, `+integer()`. I understood the problem fairly well when I wrote the question, I just didn't understand some of these cool template tricks, which I thank you and @T.C. for. =) – Myria Jul 17 '14 at 07:17
  • @Myria `std::make_unsigned::type` won't work here. If you really want to use it, it needs to be `std::make_unsigned::type`. Also, this will compile for signed types, which is probably not a great idea, so you'd need an `enable_if` somewhere: `template using promoted = typename std::enable_if::value, typename std::conditional*...*/>::type>::type;` – T.C. Jul 17 '14 at 07:22
  • @t.c. Just change the boolean expression by adding &&!is_signed: – rici Jul 17 '14 at 07:39
  • @rici If you do that it still compiles (though doesn't cast anymore). I think calling something like `promote` when you don't intend to force unsigned arithmetic should be considered a bug, so it's better to diagnose it at compile time. – T.C. Jul 17 '14 at 07:44
8

This article regarding a C solution to the case of uint32_t * uint32_t multiplication on a system in which int is 64 bits has a really simple solution that I hadn't thought of: 32 bit unsigned multiply on 64 bit causing undefined behavior?

That solution, translated to my problem, is simple:

// C++
static_cast<std::uint16_t>(1U * x * x)
// C
(uint16_t) (1U * x * x)

Simply involving 1U in the left side of the chain of arithmetic operations like that will promote the first parameter to the larger rank of unsigned int and std::uint16_t, then so on down the chain. The promotion will ensure that the answer is both unsigned and that the requested bits remain present. The final cast then reduces it back to the desired type.

This is really simple and elegant, and I wish I had thought of it a year ago. Thank you to everyone who responded before.

Myria
  • 3,372
  • 1
  • 24
  • 42