13

Take the following code

#include <iostream>

void func() {
    int i = 2147483640;
    while (i < i + 1)
    {
        std::cerr << i << '\n';
        ++i;
    }

    return;
}

int main() {
    func(); 
}

This code is clearly wrong, as the while loop can only terminate if the signed int i overflowed, which is UB, and hence the compiler may for instance optimize this into an infinite loop (which Clang does on -O3), or do other sorts of funky things. My question now is: from my reading of the C++ standard, types that are equivalent up to signedness may alias (i.e. pointers int* and unsigned* may alias). In order to do some funky signed "wrapping", does the following have undefined behavior or not?

#include <iostream>

static int safe_inc(int a)
{
    ++reinterpret_cast<unsigned&>(a);
    return a;
}

void func() {
    int i = 2147483640;
    while (i < safe_inc(i))
    {
        std::cerr << i << '\n';
        ++i;
    }

    return;
}

int main() {
    func(); 
}

I have tried the above code with both Clang 8 and GCC 9 on -O3 with -Wall -Wextra -Wpedantic -O3 -fsanitize=address,undefined arguments and get no errors or warnings and the loop terminates after wrapping to INT_MIN.

cppreference.com tells me that

Type aliasing

Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:

  • AliasedType is the (possibly cv-qualified) signed or unsigned variant of DynamicType.

which from my reading means that for purposes of type aliasing, signedness is not considered, and the code using reinterpret_cast has well-defined semantics (albeit being somewhat cheesy anyhow).

Community
  • 1
  • 1
Jonas Müller
  • 317
  • 1
  • 8
  • Since you reinterpret the *signed* int as an *unsigned* value, the code is as correct as if you used *unsigned* value from the beginning (and casting the `cerr` output to *signed*). – xryl669 May 24 '19 at 15:20
  • 3
    @LightnessRacesinOrbit: [basic.lval]/11 [lays down](https://timsong-cpp.github.io/cppwp/basic.lval#11) the validity of this access. What is missing is what the *behavior* is of modifying an unsigned/signed object through a reference to its signed/unsigned version of it. But I don't see a statement forbidding it. – Nicol Bolas May 24 '19 at 15:34
  • @NicolBolas since standard does not distinguish between right and read access here, I see no reason why we should treat write with special suspicion. – SergeyA May 24 '19 at 15:37
  • 2
    @SergeyA: Then point to the line in the specification that says what actually happens when you write to a signed object via a reference to an unsigned one. Because I can point to lines in the specification that says what happens when you, for example, call a member function of a derived class through a base class pointer/reference. But no such similar statements exist for signed/unsigned. The conversion is legit; the access is legit, but *what happens* is simply not stated by the spec. – Nicol Bolas May 24 '19 at 15:39
  • @NicolBolas ok, I see your point. Could it be underspecified? – SergeyA May 24 '19 at 15:39
  • 2
    Indeed. This kind of thing has always been underspecified for my liking. It's one of the few areas of the standard that seems to assume close-to-the-metal bit logic in places – Lightness Races in Orbit May 24 '19 at 15:40
  • 2
    @SergeyA: "*Could it be underspecified?*" Yes, this is a defect in the spec. And with the two's complement change in C++20, it can be resolved in a completely well-defined way. There just has to be wording somewhere to do it. – Nicol Bolas May 24 '19 at 15:41
  • 1
    @NicolBolas: When the Standard was written, the authors expected that actions which could and were usefully supported by some implementations, but might not make sense on all implementations, would continue to be supported on implementations where they made sense even if the Standard didn't mandate that. If there is a defect in the Standard, it's not the failure to mandate behavior, but rather its failure to indicate that it was never *intended* to be exhaustive. – supercat May 24 '19 at 16:30
  • @supercat: If it's not exhaustive, then it's not a *standard*. That's precisely why the standard has statements saying that X results in UB. It explicitly lays down what is valid and what is not. There aren't supposed to be gaps. What you believe the authors "expected" is simply incorrect. The standard never worked nor was ever meant to work the way you want it to. – Nicol Bolas May 24 '19 at 16:38
  • @NicolBolas: The published Rationale for the C Coding Standard states that a significant purpose of UB is to allow the marketplace to decide what kinds of implementations should support what "popular extensions". C89 was written to describe a core language which implementations intended for various purposes would supplement as needed to fulfill those purposes; neither it nor any version since has ever been meaningful as a complete "standard". The authors openly admit that an implementation could be "conforming" but useless, and the definition of "conforming program" is even looser. – supercat May 24 '19 at 17:00
  • @supercat: "*a significant purpose of UB*" Which is irrelevant to the question of completeness. In a complete specification, such a case would be specified to be well-defined, implementation-defined, unspecified, or undefined. What we're talking about here is an *incomplete specification*, where it says nothing about what happens in that case. Saying nothing is different from saying that something is undefined. – Nicol Bolas May 24 '19 at 17:20
  • @NicolBolas: Do you believe that the published Rationale for the C Coding Standard does not accurately reflect the intention of the Committee? If so, do you have any contrary evidence of the Committee's intentions? – supercat May 24 '19 at 17:33
  • @supercat: No, I believe you're applying it to a situation that it doesn't apply to. You do not recognize the difference between the specification explicitly declaring something to be UB and the specification simply not mentioning what happens. When, where, and why the spec writers chose to use UB is irrelevant to a discussion about a scenario that *is not undefined behavior*. It's behavior the specification does not describe. – Nicol Bolas May 24 '19 at 17:36
  • @NicolBolas: In the language described by K&R and K&R2, the principle that objects have their values stored as a sequence of consecutive bytes at their address may be applied transitively; this is sufficient to define the behavior of many constructs which need not be then specified individually. If the Standard said that such principles apply transitively *except on implementations that documents a good reason for doing something else*, that would leave ambiguous the question of exactly when conforming implementations could specify weird behaviors, but define behavior of those that don't. – supercat May 24 '19 at 18:05
  • Such code is UB as there is no object of that type (`unsigned`); if there was one, its lifetime would not have started; and it would be uninitialized anyway. – curiousguy May 24 '19 at 21:58
  • Also see [What is the strict aliasing rule](https://stackoverflow.com/a/51228315/1708801) which covers this area more completely and generally. – Shafik Yaghmour May 24 '19 at 23:34
  • @supercat C/C++ translators don't treat many constructs as a bag of bits; symbolic interpretation rely on them not being in bijection with their representation. – curiousguy May 25 '19 at 10:47
  • 1
    @curiousguy: In the language invented by Dennis Ritchie and documented in K&R's books, objects were bags of bits. The authors of the Standard didn't require that implementations be usable as high-level assemblers, but they have explicitly stated that they did not wish to preclude the language from being used in such fashion by non-portable code, since they recognized that the ability to support such code was one of C's strengths. – supercat May 25 '19 at 12:52
  • re underspecification, [this is at least specified very strongly for `bool`](https://stackoverflow.com/a/56369368/560648) – Lightness Races in Orbit May 30 '19 at 09:59

2 Answers2

6

Aliasing here is perfectly legal. See http://eel.is/c++draft/expr.prop#basic.lval-11.2:

If a program attempts to access the stored value of an object through a glvalue whose type is not similar ([conv.qual]) to one of the following types the behavior is undefined:53

(11.1) the dynamic type of the object,

(11.2) a type that is the signed or unsigned type corresponding to the dynamic type of the object

I think, it is also worth talking about the actual overflow question, which does not necessarily require reinterpret_cast. The very same effect could be achieved with implicit integral conversions

 unsigned x = i;
 ++x;
 i = x; // this would serve you just fine.

This code would be implementation defined pre-C++20, since you would be converting from the value which can't be represented by destination type.

Since C++20 this code will be well-formed.

See https://en.cppreference.com/w/cpp/language/implicit_conversion

On a side note, you might as well start with unsigned type if you want integer overflow semantic.

Community
  • 1
  • 1
SergeyA
  • 61,605
  • 5
  • 78
  • 137
  • i wonder if one cannot do the same without the `reinterpret_cast` via `unsigned x = a; ++x; return x;`. Would that not have the same effect, but valid already pre c++20? – 463035818_is_not_an_ai May 24 '19 at 15:22
  • @formerlyknownas_463035818 thx for pointing it out, reinterpret_cast is a red herring altogether. But the result would still be implementation defined pre-C++20. – SergeyA May 24 '19 at 15:23
  • 3
    This doesn't answer the question about aliasing though does it – Lightness Races in Orbit May 24 '19 at 15:25
  • @NathanOliver why would it be implementation defined? `unsigned x = a;` is fine as long as `a` fits in `unsigned`, with the `unsigend` incrementing is fine and then the `unsigned` again fits in an `int` – 463035818_is_not_an_ai May 24 '19 at 15:26
  • @LightnessRacesinOrbit do you think the question is primarily about aliasing? I feel like it might be two questions? – SergeyA May 24 '19 at 15:26
  • the reason for all of this is to remain in signed space as much as possible (specifically because of how undefined behavior gives the compiler more freedom in optimizing code), and only do a "safe" fallback to unsigned where necessary. – Jonas Müller May 24 '19 at 15:26
  • Actually casting to a reference here might have some underwater rocks, depending on what is casted – Swift - Friday Pie May 24 '19 at 15:26
  • 1
    @JonasMüller is your question about incrementing or aliasing? – SergeyA May 24 '19 at 15:27
  • 1
    @formerlyknownas_463035818 if `a` is `INT_MAX` then `x` would not fit back into `a` making it implementation defined behavior. – NathanOliver May 24 '19 at 15:28
  • 1
    I agree, there's a second implicit question in here (which concerns the overflow part), but the primary question is whether the use of `reinterpret_cast` is valid here. – Jonas Müller May 24 '19 at 15:28
  • OP asks whether his approach using aliasing is well-defined; seems pretty clear-cut – Lightness Races in Orbit May 24 '19 at 15:30
  • @NathanOliver oh right, seems like the details are more hairy than it first appears – 463035818_is_not_an_ai May 24 '19 at 15:30
  • @LightnessRacesinOrbit added aliasing part as well, it is legal. – SergeyA May 24 '19 at 15:34
  • My understanding (the underspecified modification aside) is that this only causes issues on signed-magnitude machines, the bit pattern of `INT_MAX + 1` is negative zero on signed-magnitude, which could be a trap representation. On ones' and two's complement, the bit pattern `INT_MAX + 1` is not a trap representation, and should therefore not cause issues. Or am I overseeing something here? – Jonas Müller May 24 '19 at 15:47
  • C++ is an abstraction. You have to imagine that you're not really programming your own piece of silicon, because you're not. You have to code within the rules of the abstract machine. If you don't, you can and will fall foul of the vast complexities of the translation (and "optimization") process. (As you clearly understand from your comment about the optimisation to infinite loop!) – Lightness Races in Orbit May 24 '19 at 15:48
  • @LightnessRacesinOrbit I know, the question is, aside from the underspecified nature of the modification as raised by @nicol-bolas, using `reinterpret_cast` dodges the overflow question, because you're working directly on the bit pattern, and [C++17](https://timsong-cpp.github.io/cppwp/n4659/basic.fundamental#7) and below specify two's complement, ones' complement and signed magnitude representations. So my understanding would be that unless the resulting written value is a trap representation (and the modification was to be well-defined), the increment is valid. – Jonas Müller May 24 '19 at 16:03
  • Using the wrong type to access an object is not defined anywhere. I can't even imagine how it could be implicitly defined. – curiousguy May 24 '19 at 22:01
  • @LightnessRacesinOrbit "_using aliasing is well-defined; seems pretty clear-cut_" Aliasing (having two different path to access the same location, with different types) is one thing. Accessing an object with an lvalue of a different type is another. – curiousguy May 24 '19 at 22:02
  • @curiousguy Helps if you don't quote only half of my sentence. I said _"OP asks whether his approach using aliasing is well-defined"_, not just _"using aliasing is well-defined"_. The approach included other factors of course, yes. Aliasing was a big part of it though. – Lightness Races in Orbit May 25 '19 at 12:34
  • @LightnessRacesinOrbit I was pointing out that aliasing of different pointer types does not in general imply that a value representation would be reinterpret as another type. Aliasing can be OK while reinterpretation could conceivably sometimes fail **if trapping representation of signed integers was a thing** (which is not in practice and modern C++ but was allowed at some point AFAIK). – curiousguy May 25 '19 at 12:37
  • @curiousguy And I agree! – Lightness Races in Orbit May 25 '19 at 13:24
5

Your code is perfectly legal, cpp reference is a very good source. You can find the same information in the standard [basic.lval]/11

If a program attempts to access the stored value of an object through a glvalue whose type is not similar ([conv.qual]) to one of the following types the behavior is undefined:

  • the dynamic type of the object,

  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,[...]

Community
  • 1
  • 1
Oliv
  • 17,610
  • 1
  • 29
  • 72
  • Hmm what does "corresponding to" mean here? Certainly this is not the same wording Cubbi chose on cppreference ("variant of") – Lightness Races in Orbit May 24 '19 at 15:33
  • 2
    @LightnessRacesinOrbit http://eel.is/c++draft/basic.fundamental#2. Cpp reference prefers plain English, the aim is certainly to be easier to read than the standard. – Oliv May 24 '19 at 15:34
  • For completeness, your code behavior is more specified according to the c++20 standard: http://eel.is/c++draft/basic.types#basic.fundamental-3. Nevertheless this added paragraph is just a recognition of a fact and you can safely assume your code is portable. – Oliv May 24 '19 at 16:05
  • @curiousguy It is here just to show that the cpp reference paragraph sited in the question has an equivalent in the standard. If I had to demonstrate that the piece of code above was standard compliant and did what it was supposed to do, I would have to site about 10% of the standard! – Oliv May 25 '19 at 06:07
  • @curiousguy This is the paragraph that makes aliasing UB, but for a few exceptions. You could try to find an other paragraph that may talk about accessing a "value through a different type..." to convince yourself. – Oliv May 25 '19 at 06:32
  • @curiousguy The equivalent [C standard](https://web.archive.org/web/20181230041359if_/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf) paragraph: §6.5/7 – Oliv May 25 '19 at 06:38
  • So what does `reinterpret_cast(x)` refer to? Shouldn't `+reinterpret_cast(x)` be the value of `x`? Where is the use of reinterpred_casted ptr/lvalue defined as reinterpreting the object representation? – curiousguy May 25 '19 at 10:37
  • In any C++ std version you want: You assert that every case explicitly allowed by the so called "aliasing rule" is valid (not UB); regarding: "an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union)" Can you provide an example that uses a member (recursively) of an aggregate? – curiousguy May 25 '19 at 10:51
  • 1
    @curiousguy Oh you are making reference to an old version of the c++ standard. This rule you site came from C. In C this one is necessary to define what is a member access. In c++ it has been removed because it is not necessary, member access is defined somewhere else in the standard. See core language issue [#2051](http://open-std.org/JTC1/SC22/WG21/docs/cwg_active.html#2051) – Oliv May 25 '19 at 11:49
  • @Oliv So the std used to explicitly allow something that was not possible in practice? Doesn't that impact the (meta) argument that if it's listed as not disallowed, it must be permissible and useful? – curiousguy May 25 '19 at 11:52
  • 1
    @curiousguy Interpretation of a value depends on the expression type, [intro.object](http://eel.is/c++draft/intro.object#1.sentence-10) *For other objects, the interpretation of the values found therein is determined by the type of the expressions ([expr.compound]) used to access them.* – Oliv May 25 '19 at 11:55
  • 1
    @curiousguy 6 months ago, I was actualy asking my self the same question. What is the point of this rule: https://stackoverflow.com/questions/53151521/could-we-access-member-of-a-non-existing-class-type-object. After few questions and discussion in comment someone explained me that it was just a C relica and there were discussion inside the c++ committee about just that. Apparently they decided to remove this rule. – Oliv May 25 '19 at 12:00
  • @Oliv [You can finally say bye-bye](https://github.com/cplusplus/draft/pull/4490#event-4606107988) to the filthy wording. – Language Lawyer Apr 16 '21 at 18:03
  • @LanguageLawyer I am convinced the language can be fixed because at the time C and C++ where created we had not yet discovered the right abstractions to describe programs. The major source of confusion is the inability to consider references type as regular type. The second is to consider that the type is a property of the memory and not of the access. This is fixed in rust elegantly, with other many fixes. You should try out this language. For me C++ belongs to legacy. – Oliv Apr 17 '21 at 09:14