5

My understanding is that:

  • Signed integer overflow in C++ is undefined behavior
  • Constant expressions are not allowed to contain undefined behavior.

It seems to follow that something like the following should not compile, and indeed on my compiler it doesn't.

template<int n> struct S { };

template<int a, int b>
S<a * b> f()
{
  return S<a * b>();
}

int main(int, char **)
{
  f<50000, 49999>();
  return 0;
}

However, now I try the following instead:

#include <numeric>

template<int n> struct S { };

template<int a, int b>
S<std::lcm(a, b)> g()
{
  return S<std::lcm(a,b)>();
}

int main(int, char **)
{
  g<50000, 49999>();
  return 0;
}

Each of g++, clang, and MSVC will happily compile this, despite the fact that

The behavior is undefined if |m|, |n|, or the least common multiple of |m| and |n| is not representable as a value of type std::common_type_t<M, N>.

(Source: https://en.cppreference.com/w/cpp/numeric/lcm)

Is this a bug in all three compilers? Or is cppreference wrong about lcm's behavior being undefined if it can't represent the result?

Daniel McLaury
  • 4,047
  • 1
  • 15
  • 37
  • What C++ Standard are you compiling with/by? – Adrian Mole Jun 03 '22 at 20:02
  • 1
    I'd love it if UB meant the program didn't compile. Crom would hear my praises just about every day if he granted such a mercy. – user4581301 Jun 03 '22 at 20:04
  • 1
    UB in standard library functions is not required to be diagnosed. – HolyBlackCat Jun 03 '22 at 20:05
  • Maybe related: [Why is Signed Overflow due to computation still Undefined Behavior in C++20](https://stackoverflow.com/q/70801443/10871073) – Adrian Mole Jun 03 '22 at 20:05
  • @user4581301: The question is specifically about UB in constexpr, which is a special situation where it's not supposed to compile. (However see the comment by HolyBlackCat and the answer by BrianBi.) – Daniel McLaury Jun 03 '22 at 20:18
  • @AdrianMole: Not related. The question isn't about why signed overflow is UB, it's about why the signed overflow inside the call to lcm() *isn't* being treated like the UB that it is. (See the comment by HolyBlackCat and the answer by Brian Bi.) – Daniel McLaury Jun 03 '22 at 20:19
  • 1
    fyi clang with libc++ fails to compile this - live - https://godbolt.org/z/xsvEvfroo – Richard Critten Jun 03 '22 at 21:25
  • `std::lcm()` isn't a constexpr function. If you create a `costinit` variable you get nice error messages from the compiler: https://godbolt.org/z/cKxcGGKWE – Goswin von Brederlow Jun 03 '22 at 22:33
  • @RichardCritten: Ah, nice. I'm not used to using clang so I just assumed it used libc++ by default. When I tested with clang I guess it was using my system libstdc++. – Daniel McLaury Jun 03 '22 at 23:26
  • @GoswinvonBrederlow [cppreference disagrees](https://en.cppreference.com/w/cpp/numeric/lcm). And indeed if I change it to avoid the UB as per Brian's answer (and make `g` `constexpr`) [it does compile](https://godbolt.org/z/czxh1G4KK) – Nathan Pierson Jun 04 '22 at 00:20
  • @NathanPierson You are right, I overlooked the constexpr. The problem is actually the `__absu` function making everything unsigned. – Goswin von Brederlow Jun 04 '22 at 00:30

2 Answers2

11

According to [expr.const]/5, "an operation that would have undefined behavior as specified in [intro] through [cpp]" is not permitted during constant evaluation, but:

If E satisfies the constraints of a core constant expression, but evaluation of E would evaluate an operation that has undefined behavior as specified in [library] through [thread], or an invocation of the va_­start macro ([cstdarg.syn]), it is unspecified whether E is a core constant expression.

We usually summarize this as "language UB must be diagnosed in a context that requires a constant expression, but library UB does not necessarily need to be diagnosed".

The reason for this rule is that an operation that causes library UB may or may not cause language UB, and it would be difficult for compilers to consistently diagnose library UB even in cases when it doesn't cause language UB. (In fact, even some forms of language UB are not consistently diagnosed by current implementations.)

Some people also refer to language UB as "hard" UB and library UB as "soft" UB, but I don't like this terminology because (in my opinion) it encourages users to think of "code for which it's unspecified whether language UB occurs" as somehow less bad than "code that unambiguously has language UB". But in both cases, the result is that the programmer cannot write a program that executes such code and expect anything to work properly.

Brian Bi
  • 111,498
  • 10
  • 176
  • 312
1

The problem is that std::lcm() is doing computations using unsigned of whatever type the arguments are. It uses using _Up = make_unsigned_t<common_type_t<_Mn, _Nn>>; in my STL and converts all arguments to _Up first. 50000 * 49999 = 2499950000 < 4294967296 = 2^32 does not cause an overflow and unsigned overflow would not be UB in any case.

But if you have template code for gcd and lcm like this without changing types: https://godbolt.org/z/zoxzsr45x

// GCD implementation
template<typename T, T m, T n>
constexpr T
gcd()
{
    if constexpr (m == 0) {
        return n;
    } else if constexpr (n == 0) {
        return m;
    } else {
        return gcd<T, n, T(m % n)>();
    }
}

// LCM implementation
template<typename T, T m, T n>
constexpr T
lcm()
{
    if constexpr (m != 0 && n != 0) {
        return (m / gcd<T, m, n>()) * n;
    } else {
        return 0;
    }
}

constinit auto t = lcm<int, 50000, 49999>();

int main(int, char **)
{
  return 0;
}

Then the compiler fails with:

<source>: In instantiation of 'constexpr T lcm() [with T = int; T m = 50000; T n = 49999]':
<source>:27:42:   required from here
<source>:21:37: warning: integer overflow in expression of type 'int' results in '-1795017296' [-Woverflow]
   21 |         return (m / gcd<T, m, n>()) * n;
      |                ~~~~~~~~~~~~~~~~~~~~~^~~
<source>:27:16: error: 'constinit' variable 't' does not have a constant initializer
   27 | constinit auto t = lcm<int, 50000, 49999>();
      |                ^
<source>:27:42:   in 'constexpr' expansion of 'lcm<int, 50000, 49999>()'
<source>:27:43: error: overflow in constant expression [-fpermissive]
   27 | constinit auto t = lcm<int, 50000, 49999>();
      |                                           ^

In gcc-10 under Debian std::lcm is defined as:

  // std::abs is not constexpr, doesn't support unsigned integers,
  // and std::abs(std::numeric_limits<T>::min()) is undefined.
  template<typename _Up, typename _Tp>
    constexpr _Up
    __absu(_Tp __val)
    {
      static_assert(is_unsigned<_Up>::value, "result type must be unsigned");
      static_assert(sizeof(_Up) >= sizeof(_Tp),
          "result type must be at least as wide as the input type");
      return __val < 0 ? -(_Up)__val : (_Up)__val;
    }
  /// Least common multiple
  template<typename _Mn, typename _Nn>
    constexpr common_type_t<_Mn, _Nn>
    lcm(_Mn __m, _Nn __n) noexcept
    {
      static_assert(is_integral_v<_Mn>, "std::lcm arguments must be integers");
      static_assert(is_integral_v<_Nn>, "std::lcm arguments must be integers");
      static_assert(_Mn(2) == 2, "std::lcm arguments must not be bool");
      static_assert(_Nn(2) == 2, "std::lcm arguments must not be bool");
      using _Up = make_unsigned_t<common_type_t<_Mn, _Nn>>;
      return __detail::__lcm(__detail::__absu<_Up>(__m),
                             __detail::__absu<_Up>(__n));
    }

The cast to _Up and return type of __absu causes the UB to go away.

Goswin von Brederlow
  • 11,875
  • 2
  • 24
  • 42
  • "The problem is that std::lcm() is not a constexpr function" It is according to cppreference and according to every compiler's standard library that I tested. If it wasn't then the code in my answer *definitely* wouldn't compile. – Daniel McLaury Jun 03 '22 at 23:21
  • @DanielMcLaury You are right, I lost the constexpr in all that template clutter around the definition of lcm. The problem is the conversion with `make_unsigned_t` removing any UB in the arithmetic. – Goswin von Brederlow Jun 04 '22 at 00:22
  • "std::lcm is doing computations using unsigned" really? source? – Nathan Pierson Jun 04 '22 at 00:23
  • @NathanPierson /usr/include/c++/10/numeric from gcc-10 in Debian – Goswin von Brederlow Jun 04 '22 at 00:25