3

The code is the following:

#include <cstdint>
#include <iostream>

using u64 = std::uint64_t;

u64 *test() {
    u64 *a, *p;

    p = (u64 *)&a;
    a = (u64 *)&p;

    {
        for (int i = 0; i < 100; ++i) {
            p = new u64((u64)p);
        }
    }

    while (true) {
        if ((u64)p == 0) {
            break;
        }

        p = (u64 *)*p;
    }

    return p;
}

int main() {
    std::cout << test() << std::endl;
}

And the compiled asm of function test is the following:

test():
        xor     eax, eax
        ret

You can see https://godbolt.org/z/8eTd8WMzG.

In fact, it's expected when the final stmt is return a; although the compiler tells a warning about retuning a local address. And if I make a and p being global variables, everything is ok, see https://godbolt.org/z/n7YWzGvd5.

So, I think that maybe I face some ubs so that its behavior not match my expectation?

cigien
  • 57,834
  • 11
  • 73
  • 112
Mu00
  • 73
  • 7

1 Answers1

3

The instructions p = (u64 *)&a; and a = (u64 *)&p; followed by assignments and the dereferencing of the variables break the strict aliasing rule resulting in a undefined behaviour. Indeed, p and a are of type u64* while &a and &p are of type u64**. Moreover, p = (u64 *)*p; is a perfect example of instruction breaking the strict aliasing rule: u64**, u64* and u64 are three distinct different types.

If you want to solve this, you first need to check the size and the alignment of the types match (it should be fine on a mainstream 64-bit architecture). Moreover, you should use a std::bit_cast or a memcpy to perform the conversions (see this related post).

Moreover, note that infinite loops without side-effects are undefined behaviour too. Since p cannot be null in your case. A compiler detecting that your loop is infinite and does not have any side effect can just remove it (or can generate a wrong code too).

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
  • Thanks! But how does `p = (u64 *)*p;` break the strict aliasing rule? – Mu00 Nov 27 '21 at 03:16
  • 2
    `p` is dereferenced to (simultaneously) read/store values of different types like `u64`, `u64*` (and `u64**`). See the [definition on cppreference](https://en.cppreference.com/w/User:T._Canens/strict_aliasing_rewrite). Such type are not *similar* because they are not the same type "*after stripping away cv-qualifications at every level*". This case is clearly an "*attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType*". – Jérôme Richard Nov 27 '21 at 11:37
  • Note that the strict aliasing rule does not strictly apply to one specifically statement but rather to the entire code. If this is still not clear to you, you can focus on the type of the objects actually read/stored at the dereferenced address: what is the one used to store the object and what is the one used to read it? [This](https://stackoverflow.com/questions/68863124) post may also help you to understand strict aliasing. – Jérôme Richard Nov 27 '21 at 11:51