2

In a Wikipedia article on type punning it gives an example of pointing an int type pointer at a float to extract the signed bit:

However, supposing that floating-point comparisons are expensive, and also supposing that float is represented according to the IEEE floating-point standard, and integers are 32 bits wide, we could engage in type punning to extract the sign bit of the floating-point number using only integer operations:

bool is_negative(float x) {
    unsigned int *ui = (unsigned int *)&x;
    return *ui & 0x80000000;
}

Is it true that pointing a pointer to a type not its own is undefined behavior? The article makes it seem as if this operation is a legitimate and common thing. What are the things that can possibly go wrong in this particular piece of code? I'm interested in both C and C++, if it makes any difference. Both have the strict aliasing rule, right?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Zebrafish
  • 11,682
  • 3
  • 43
  • 119
  • 4
    Yes, indeed type punning like that is undefined in C++. And indeed, one of the possible observable effect of undefined behavior is program executing as expected. See https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule Also notice, that in C++20 `bit_cast` can be used: https://en.cppreference.com/w/cpp/numeric/bit_cast – SergeyA Oct 27 '20 at 20:55
  • 2
    Regardless of whether this quote is right or not, you should be aware that there's a *lot* of incorrect info about c++ out there. cppreference and the standard is usually the only sources you should trust. – cigien Oct 27 '20 at 20:56
  • 3
    Wikipedia is not a reliable source. – Govind Parmar Oct 27 '20 at 20:57
  • FWIW, yes it is UB in C++, but I have not seen a compiler that doesn't do what you'd think it would do, so there is that. Depends on how many warning you want to ignore/suppress. The legal way to do type punning in C++ is to use `memcpy` – NathanOliver Oct 27 '20 at 20:58
  • Also, FWIW, in this example using `memcpy` produces **exactly** the same result as type punning. – SergeyA Oct 27 '20 at 21:03
  • @NathanOliver Why memcpy? Would an assignment to a temporary variable do? Also how about using unions? I read that it's a common way. – Zebrafish Oct 27 '20 at 21:03
  • 1
    Type punning via pointer conversion is undefined in C as well, using a `union` would be ok, though. – Jens Gustedt Oct 27 '20 at 21:03
  • 1
    @JensGustedt You **can** type-pun to `[[un]signed] char`, however. Which is probably what the example code **should** be doing. – Andrew Henle Oct 27 '20 at 21:07
  • @Zebrafish assignment wouldn't preserve the bit pattern. `memcpy` will. – NathanOliver Oct 27 '20 at 21:10
  • @AndrewHenle, yes, but then you get into trouble with endianess, I guess. No, the *right* thing to do is to use `signbit`. – Jens Gustedt Oct 27 '20 at 21:10
  • c++ has added `std::bit_cast` to do these sorts of things without UB. – doug Oct 27 '20 at 21:24
  • Off Topic: Clever "tricks" like this are annoying. Sure wikipedia goes out of its way to make up an architecture which supports floats but floats are slow (which is a corner case, most arch's dont support floats _or_ they do and they're fast). On a mainstream x64 machine (which has good float support) the simple, naive way is piles more readable _and_ faster. https://godbolt.org/z/cjerM9 – Mike Vine Oct 27 '20 at 21:34
  • that quote from wikipedia is not valid code (as other commenters have mentioned) a simple way to find the sign of a float value is: `if( value >= 0.0f ) then value is positive else value is negative – user3629249 Oct 29 '20 at 19:45
  • @user3629249: Such constructs could not be used in a Strictly Conforming C Program, but the phrase "non-portable or erroneous" does not exclude "non-portable but entirely correct", and nothing in the Standard would forbid the use of such constructs in a non-portable but Conforming C Program. The question of which non-portable constructs to support is a Quality of Implementation issue outside the Standard's jurisdiction. – supercat Nov 01 '20 at 21:52
  • @MikeVine: Until the mid 1990s, most architectures couldn't support floating-point operations anywhere near as quickly as integer operations, and that remains true of many microcontroller C implementations today. Try your example with ARM gcc 9.2.1 using flag `-mcpu=cortex-m3` to target a popular microcontroller core. I wouldn't say such performance differences only exist in a "made up" architecture. – supercat Nov 01 '20 at 21:58
  • @doug: Given e.g. `uint32_t x; uint64_t *p`, and targeting a 32-bit platform, could one use `std::bit_cast` on a `uint64_t` to efficiently add a `x` to the upper word, as an expeditious way of adding `((uint64_t)x) << 32` to the value, or would one have to read the whole value, modify it, and write it back, likely resulting in code that's even less efficient than what gcc would produce when adding `x<<32` to a `uint64_t`? – supercat Nov 01 '20 at 22:16
  • @supercat I don't think `bit_cast` helps for your example. mostly it's good for things like parsing out bits in floating point values to ints of the same size. Especially convenient for doing things like working with small (16 bit) limited range "floating" user types where one doesn't need the precision. Especially good now that it is constexpr since one can't use other hacks to do it in a constexpr function. – doug Nov 02 '20 at 04:56
  • @doug: It's too bad that compiler writers would rather insist that there's no useful purpose for type-punned access rather than specifying that if a `T&` is bit-cast to a `U&`, *and all access to the storage within the lifetime of the reference is made through it*, such an access will be treated as though made through the `T&`, since treating a bit-cast of a reference in such a fashion would yield clear semantics that should be easy for any compilers to support. Even if compilers needed to treat the beginning and end of the reference's lifetime as potential memory clobbers... – supercat Nov 02 '20 at 15:39
  • ...on objects of type `T`, conversions of reference types wouldn't generally be used except in cases where the resulting reference would be used to access storage that outside code knew of as type `T`, and the performance costs would be lower than those of using workarounds like `memcpy`. – supercat Nov 02 '20 at 15:41

5 Answers5

4

Is it true that pointing a pointer to a type not its own is undefined behavior?

No, both C and C++ allow an object pointer to be converted to a different pointer type, with some caveats.

But with a few narrow exceptions, accessing the pointed-to object via the differently-typed pointer does have undefined behavior. Such undefined behavior arises from evaluating the expression *ui in the example function.

The article makes it seem as if this operation is a legitimate and common thing. What are the things that can possibly go wrong in this particular piece of code?

The behavior is undefined, so anything and everything within the power of the program to do is possible. In practice, the observed behavior might be exactly what the author(s) of the Wikipedia article expected, and if not, then the most likely misbehaviors are variations on the function computing incorrect results.

I'm interested in both C and C++, if it makes any difference. Both have the strict aliasing rule, right?

To the best of my knowledge, the example code has undefined behavior in both C and C++, for substantially the same reason.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 1
    yes, but one could add that the C and C++ standards have the `signbit` macro that is supposed to answer that exact question, and is expected to do that in any way that is best for the platform in question. – Jens Gustedt Oct 27 '20 at 21:09
  • That's certainly relevant to the question of determining the sign of a floating-point number, @JensGustedt. I suppose your point is that an implementation might provide a `signbit` macro that relies on type punning, and *in that implementation* it should be expected to work reliably. But even in that case, one cannot safely generalize to draw conclusions about other uses of type punning with such an implementation. It's best to treat implementations of standard-library macros as opaque. That they are macros at all is relevant primarily in that they do not have addresses. – John Bollinger Oct 27 '20 at 21:32
3

The fact that it is technically undefined behaviour to call this is_negative function implies that compilers are legally allowed to "exploit" this fact, e.g., in the below code:

if (condition) {
    is_negative(bar);
} else {
    // do something
}

the compiler may "optimize out" the branch, by evaluating condition and then unconditionally proceeding to the else substatement even if the condition is true.

However, because this would break enormous amounts of existing code, "real" compilers are practically forced to treat is_negative as if it were legitimate. In legal C++, the author's intent is expressed as follows:

unsigned int ui;
memcpy(&ui, &x, sizeof(x));
return ui & 0x80000000;

So the reinterpret_cast approach to type punning, while undefined according to the standard in this case, is thought of by many people as "de facto implementation-defined" and equivalent to the memcpy approach.

Brian Bi
  • 111,498
  • 10
  • 176
  • 312
  • The fact that the Standard does not require that compilers process a construct in meaningful fashion does not imply any judgment as to whether an implementation can be suitable for any particular purpose without doing so. Unfortunately, compiler writers are not always clear about the range of purposes for which their products are intended to be suitable. – supercat Nov 01 '20 at 21:46
1

Why

If this is undefined behavior then why is it given as a seemingly legitimate example?

This was a common practice before C was standardized and added the rules about aliasing, and it has unfortunately persisted in practice. Nonetheless, Wikipedia pages should not be offering it as examples.

Aliasing Via Pointer Conversions

Is it true that pointing a pointer to a type not its own is undefined behavior?

The rules are more complicated than that, but, yes, many uses of an object through an lvalue of a different type are not defined by the C or C++ standards, including this one. There are also rules about pointer conversions that may be violated.

The fact that many compilers support this behavior even though the C and C++ standards do not require them to is not a reason to do so, as there is a simple alternative defined by the standards (use memcpy, below).

Using Unions

In C, an object may be reinterpreted as another type using a union. C++ does not define this:

union { float f; unsigned int ui; } u = { .f = x };
unsigned int ui = u.ui;

or the new value may be obtained more tersely using a compound literal:

(union { float f; unsigned int ui; }) {x} .ui

Naturally, float and unsigned int should have the same size when using this.

Copying Bytes

Both C and C++ support reinterpreting an object by copying the bytes that represent it:

unsigned int ui;
memcpy(&ui, &x, sizeof ui);

Naturally, float and unsigned int should have the same size when using this. The above is C code; C++ requires std::memcpy or a suitable using declaration.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • I've been using C++ memcpy, not in the std:: namespace. What's the difference? Is it wrong? – Zebrafish Oct 27 '20 at 21:17
  • @Zebrafish: Compilers may be lax and permit it. Actually, the issue may be in the headers rather than a compiler per se; they may pollute the global namespace improperly. This is assuming you include C++ headers like ``, rather than the C headers like ``. I am not sure whether old C++ standards permitted this. Adding certain compiler switches like `-std=c++17` might tighten this up (even in the header, rather than the compiler), but I am speculating a bit. In any case, it is good to write correct code even if your compiler is lax, so I suggest favoring `std::memcpy`. – Eric Postpischil Oct 27 '20 at 21:21
0

Accessing data through pointers (or unions) seems pretty common in (embedded) c code but requires often extra knowledge.

  • If a float would be smaller then an int, you would be accessing outside defined space.
  • the code takes several assumptions on where and how the sign bit is stored (little vs big endian, 2s-complement)
0

When the C Standard characterizes an action as invoking Undefined Behavior, that implies that at least one of the following is true:

  1. The code is non-portable.
  2. The code is erroneous.
  3. The code is acting upon erroneous data.

One of the reasons for the Standard leaves some actions as Undefined is to, among other things, "identify areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior." A common extension, listed in the Standard as one of the ways implementations may process constructs that invokes "Undefined Behavior", is to process some such constructs by "behaving during translation or program execution in a documented manner characteristic of the environment".

I don't think the code listed in the example claims to be 100% portable. As such, the fact that it invokes Undefined Behavior does not preclude the possibility of it being non-portable but correct. Some compiler writers believe that the Standard was intended to deprecate non-portable constructs, but such a notion is contradicted by both the text of the Standard and the published Rationale. According to the published Rationale, the authors of the Standard wanted to give programmers a "fighting chance" [their term] to write portable code, and defined a category of maximally-portable programs, but not not specify portability as a requirement for anything other than strictly conforming C programs, and they expressly did not wish to demean programs that were conforming but not strictly conforming.

supercat
  • 77,689
  • 9
  • 166
  • 211