How to deal with potentially mixed signed and unsigned arithmetic in generic code?

Question

The canonical example for mixed signed and unsigned arithmetic seems to be:

    unsigned int u = 10;
    int          a = -42;
    auto tmp1 = u - a;
    std::cout << tmp1 << std::endl;
    int tmp2 = tmp1;
    std::cout << tmp2 << std::endl;

(Note that I am using auto tmp1 so I can show the intermediate result and inspect the type. In the real code, this is all one line , of course)

if you enable enough warnings (-Wconversion on clang-13 and, additionally, -Wsign-conversion on gcc-11) the compiler complains nicely:

main.cpp:148:21: warning: conversion to ‘unsigned int’ from ‘int’ may change the sign of the result [-Wsign-conversion]
  148 |     auto tmp1 = u - a;
      |                     ^
main.cpp:150:16: warning: conversion to ‘int’ from ‘unsigned int’ may change the sign of the result [-Wsign-conversion]
  150 |     int tmp2 = tmp1;

but at least the "answer is correct":

output

52
52

It's nicely explained, why this still "sort of works", here: https://stackoverflow.com/a/25609711/1087626

Of course the general, and correct, advice is: "Don't mixed signed and unsigned types".

However, recently, in generic code, where the user is choosing the types, I came across this simplified example:

    unsigned int count = 10;
    int          sum = -100;
    auto avg1 = sum  / count;
    std::cout << avg1 << std::endl;
    int avg2 = avg1;
    std::cout << avg2 << std::endl;

The compiler still complains nicely if you have the right warnings enabled:

main.cpp:155:17: warning: conversion to ‘unsigned int’ from ‘int’ may change the sign of the result [-Wsign-conversion]
  155 |     auto avg1 = sum  / count;
      |                 ^~~
main.cpp:157:16: warning: conversion to ‘int’ from ‘unsigned int’ may change the sign of the result [-Wsign-conversion]
  157 |     int avg2 = avg1;

and the answer is catastrophically wrong:

output

429496719
429496719

In the real code, this was running in a loop where avg was feeding back into sum. So sum then quickly overflowed. Being signed integer overflow this was undefined behaviour. My UBSan dutifully kicked off, and butterflies flew out of the computer to make me lunch.

What is a safe way to solve this in generic code, when the user chooses the types?

The obvious auto avg1 = sum / static_cast<int>(count); ?? Works, no more UB, correct answer, and no warnings. But seems like a cludge / sledge hammer, and could be hiding other issues?
Restrict the types chosen by the user (using concepts or similar) such that sum, count and avg always have the same "signedness".

Or something else entirely ...?

EDIT

For clarity, here is what a toy template might look like, and how the user might get caught out.

#include <iostream>

template <typename Value, typename Sum, typename Count>
Value calc_average(Sum sum, Count cnt) {
    return sum / cnt;
}

int main() {
    // All good
    std::cout << calc_average<int, long, unsigned>(-100, 10) << "\n"; 

    // Oh, oh, Butterflies!!!
    std::cout << calc_average<int, int, unsigned>(-100, 10) << "\n";  
    return EXIT_SUCCESS;
}

Output

-10
429496719

For those interested in the longer motivating example, it is all spelled out in this codereview: https://codereview.stackexchange.com/questions/274171/generic-exponential-damping-to-smooth-noisy-signals

If you _don't_ do `static_cast(count)` then `sum` will instead be converted to `unsigned`, so yes, that's really how you need to do it. You could print the result of `static_cast(a)` in your top example to see what that conversion does. — Ted Lyngmo, Feb 18 '22 at 12:35
Main question is how did it come to this that you have to subtract a signed integer from an unsigned one. Perhaps the data types weren't correct to begin with? — rustyx, Feb 18 '22 at 12:36
@rustyx Please re-read the question. This is all in a template where the user chooses the types. So if the user chooses "badly" (they shouldn't need to understand the inner workings), then it will go badly wrong. Unless we "force the types" (with static_cast) or "restrict the types" (with concepts) or something else... Which of these, that is precisely the question. — Oliver Schönrock, Feb 18 '22 at 12:39
No easy answers here, but restricting the typesy seems like a better approach to me, as a general rule of thumb at least. And then all the specific exceptions come ;) EDIT What can also be done is usage of some `common_arith_type` template to make everything compatible. But it might require lots of extra work... — alagner, Feb 18 '22 at 12:39
@TedLyngmo Yes. So I "must" `static_cast`? Is that the best or even only solution? `static_cast` also forces sizes in addition to signedness so it hides a bunch of problems. — Oliver Schönrock, Feb 18 '22 at 12:40
@OliverSchönrock My point was there's no correct way to fix this via casting. Avoid the situation from occurring in the first place. E.g.: `static_assert(std::is_signed::value != std::is_signed::value, "T and U must be both signed or both unsigned types");` — rustyx, Feb 18 '22 at 12:52
@alagner Yes, I also thought about trying to find "common types" somehow (not really sure how to do that, I found docs for `std::common_type` very confusing). And the number of permutations of 3 integer types is huge... — Oliver Schönrock, Feb 18 '22 at 12:53
@OliverSchönrock The _best_ solution is probably to avoid mixing signedness and the second best is to `static_cast` if the implicit conversion would result in the wrong type for the operation at hand. — Ted Lyngmo, Feb 18 '22 at 12:54
@OliverSchönrock the trivial solution with `common_type` won't work for sure, more magic is needed there ;) — alagner, Feb 18 '22 at 12:55
@rustyx OK, I misunderstood you original comment. Yes, that was my feeling also. The "solution" presented to me elsewhere was "use a `static_cast`" but this felt like plastering over the cracks to me. So your vote is for Option 2? — Oliver Schönrock, Feb 18 '22 at 12:56
@rustyx Could you explain in an answer why exactly `static_cast` is flawed here and that therefore the situation should be prevented in the first place (there are a bunch of different ways to do that). — Oliver Schönrock, Feb 18 '22 at 12:59
*So if the user chooses "badly"*. That already might happen in std, `std::accumulate(floats.begin(), floats.end(), 0/*int!*/);`. — Jarod42, Feb 18 '22 at 12:59
Usual reason why we have that mix is that on one hand we need signed calculations and on other hand some things meaningful can't be made signed without casting (like result of sizeof or whatever std::container::size_type). So we have to pay attention at signed/unsigned warnings and to cast ... no way out of the issue. — Öö Tiib, Feb 18 '22 at 13:02
@Jarod42 So you think that using documentation ( "Common mistakes If left to type inference, op operates on values of the same..." https://en.cppreference.com/w/cpp/algorithm/accumulate ) is an acceptable solution? Seems like an unnecessary, preventable footgun? — Oliver Schönrock, Feb 18 '22 at 13:02
Yes option 2 sounds better, or maybe even let the user choose a single type instead of multiple, if they are related. Regarding `static_cast`... Consider in the second example the argument values are 3000000000u and -1. You'd essentially need different code paths depending on the range of each argument. See [here](https://stackoverflow.com/a/57947296/485343) for example. — rustyx, Feb 18 '22 at 13:16
@rustyx Yes, giving the user less choice is something I also considered. The reason i want to give them choice are mainly the sizes. If going down the "coerce the types" paths, then `static_cast>(value);` from your linked answer could be interesting, because that only coerces the bit we want t coerce. — Oliver Schönrock, Feb 18 '22 at 13:20
Since this is generic code, and since the generic code is making assumptions about the suitability of the user supplied types, the `static_assert` is that rustyx suggested is the solution I'd use. Put the responsibility on the user of the code to comply with the generic code's requirements. — Eljay, Feb 18 '22 at 13:32
@Eljay I tend to agree. I added a toy example template and a link to the full motivating usecase in the question above. — Oliver Schönrock, Feb 18 '22 at 13:39
Note: if the signedness doesn't match but the types are such that wider types exist (e.g. `int64_t` )then you can just migrate to that for actual calculations and then convert back potentially. But having the same signedness will make things a lot easier from the get go. — Mgetz, Feb 18 '22 at 14:50

How to deal with potentially mixed signed and unsigned arithmetic in generic code?

0 Answers0