C++ Pointers break when changing a pointer

Question

When I change a pointer in a Union, my other pointers break and show invalid pointer.

CustomDataTypeExample Class:

struct CustomDataTypeExample {
float x;
float y;
float z;
CustomDataTypeExample() = default;
CustomDataTypeExample(float x, float y, float z) {

    this->x = x;
    this->y = y;
    this->z = z;

};

// ...
};

ConfigCustomDataTypeExample class:

struct ConfigCustomDataTypeExample {
public:
    ConfigCustomDataTypeExample() = default;
    ConfigCustomDataTypeExample(CustomDataTypeExample values) {
        x = &values.x;
        y = &values.y;
        z = &values.z;
    }
    union {
        struct {

            CustomDataTypeExample* ex;
        };
        struct {

            float* x;
            float* y;
            float* z;
        };
    };
};

main:

ConfigCustomDataTypeExample example({ 1.2f,3.4f,5.6f });
float value = 565;
example.x = &value;
std::cout << example.ex->x << ", " << example.ex->y << ", " << example.ex->z << "\n";
std::cout << *example.x << ", " << *example.y << ", " << *example.z << "\n";

Output:

565, -1.07374e+08, -1.07374e+08
565, 3.4, 5.6

What exactly is happening? If I dont change the example.x to point to something else it would work just fine otherwise if i change it then it will ruin the other pointers.

Hello, and welcome to StackOverflow. Do you know what a 'union' does? Because I suspect that it does not what you think it does. Someone compared it with a room in a hotel: Only one tennant can occupy it at a time. In your case that is either the struct or the class. So the instant you assign to the class, you dismiss the struct and vice versa. Behavior is only defined, as long as you keep accessing 'the member you wrote last'. — Refugnic Eternium, Oct 25 '20 at 19:49
Thanks for the reply. So you think its not possible for what im trying to achieve by using the union or is there some kind of a work around? — Someone, Oct 25 '20 at 19:55
In the `ConfigCustomDataTypeExample` constructor function the variable `values` is a *local* variable, one whose life-time ends when the constructor function ends. The pointers you save will become invalid as soon as `values` ceases to exist. Dereferencing those pointers later will lead to *undefined behavior*. — Some programmer dude, Oct 25 '20 at 19:56
Can you explain what you're trying to achieve? So far, you describe what you observed, but not what you are trying to do. — Raymond Chen, Oct 25 '20 at 19:58
@Someone I am afraid, I am not exactly sure what it is you are trying to achieve. — Refugnic Eternium, Oct 25 '20 at 19:58
Sorry, I might of missed mentioning my intention. What i was trying to do is changing the values inside class CustomDataTypeExample by making a wrapper that is supposed to use pointers to change the values inside the main class, hope it is clear since it is really hard for me to explain. — Someone, Oct 25 '20 at 20:01
Well, the obvious way to go for that is by using 'setters' on the object itself. And if you absolutely must use pointers, I suggest using automatic variables within the main class (`float x; ...`) and then use getters to obtain a pointer to that variable. No union necessary at all. — Refugnic Eternium, Oct 25 '20 at 20:06
The thing is, I can't actually modify the main class because I'll be using this as an idea template for other classes that I won't be editing suck as glm::vec3 and so fourth. I'm writing this for a config system Edit: This is a really weird problem for me because if i don't change the pointer then the wrapper works exactly how i want it but when i do it just breaks the other pointer variables in the union, but the modified pointer still functions just fine. — Someone, Oct 25 '20 at 20:09

Rane · Answer 1 · 2020-10-28T06:00:35.870

TL;DR: Three different kinds of undefined behaviour: lifetime issue, accessing a non-active member of an union (without non-standard extensions) and dereferencing an invalid pointer value through the members of example.ex (a misunderstanding of the what the declared union represented).

Looks like you could do with using plain references. The full solution is described at the end.

Deeper analysis

This is actually a really interesting problem as there is so much going on here! Three different kinds of undefined behavior. Let's go over these piece by piece.

First, like mentioned in the comments, you are assigning the address of the parameter values to x, y and z (addresses of the members). The parameter values has an automatic storage duration, which means it gets destructed at the end the constructor for ConfigCustomDataTypeExample.

struct ConfigCustomDataTypeExample {
public:
    ConfigCustomDataTypeExample() = default;
    ConfigCustomDataTypeExample(CustomDataTypeExample values) {
        x = &values.x;
        y = &values.y;
        z = &values.z;
    } // Pass this line x, y and z store invalid pointer values
      // (addresses to now destructed members of values).
      // Any indirection through these pointers is undefined behavior.
...

With your program you were still able to read the values of y and z. This is the essence of undefined behaviour: you might sometimes get sensible results, but nothing is guaranteed. For example when I tried to run your program, I got wildly different results for y and z. This was the first clear UB. Let's examine the declaration of the union next to understand what it really represents.

A class is a type that consist of a sequence of members. Union is a special type of class that can hold at most one of its non-static data members at a time. The currently held object for an union is called the active member. This implies that an union is only as big as its largest data member, which is useful if memory usage is a concern.

union {
  struct {
      CustomDataTypeExample* ex;
  };
  struct {
      float* x;
      float* y;
      float* z;
  };
};

For this union the members are the two anonymous structs (note that anonymous structs are prohibited by the C++ standard). The size of the union is determined by the largest struct, which is is the float* struct. For a 64-bit system a the size of a pointer type is commonly 8 bytes, thus for a 64-bit system the size of this union is 24 bytes.

What comes to the usage of the union, you are clearly not utilizing the union for the purpose of reducing memory consumption. Instead, you are trying to do something called type punning. Type punning is when you try interpret a binary representation of a type as another type. According to C++ standard type punning with unions is undefined behavior (second), albeit many compilers provide non-standard extensions that allow this. Let's analyze your main program according to the standard rules:

ConfigCustomDataTypeExample example({1.2f, 3.4f, 5.6f});
// The anonymous struct holding 3 float* is now the active member.
// Though, all of the pointers are invalid, as already mentioned.

float value = 565;

example.x = &value;
// example.x is now a valid ptr value
 
std::cout 
    << example.ex->x << ", "  // UB: Accessing a non-active member
    << example.ex->y << ", "  // UB: non-active and invalid ptr (more on that later)
    << example.ex->z << "\n"; // UB: same as above

std::cout 
    << *example.x << ", "     // This is ok (active member and valid ptr)
    << *example.y << ", "     // UB: indirection to an invalid ptr
    << *example.z << "\n";    // UB: same as above

Yet again, undefined behavior was kind enough to print 565 when dereferencing example.ex->x. This is because the float* x and example.ex->x overlap in the union's binary representation, albeit this is still undefined behavior.

Let's first quick fix the lifetime issue by changing ConfigCustomDataTypeExample to take a reference as parameter: ConfigCustomDataTypeExample(CustomDataTypeExample& values) and declare a CustomDataTypeExample variable in main. I am also compiling with gcc, where type punning with unions is well defined (non-standard extension):

CustomDataTypeExample data{1.0f, 2.0f, 3.0f};
ConfigCustomDataTypeExample example(data);
    
float value = 565;
example.x = &value;

std::cout 
    << example.ex->x << ", "  // This is now ok (using gcc's non-standard extension)
    << example.ex->y << ", "  // Something seems odd
    << example.ex->z << "\n"; // with these two lines
    
std::cout 
    << *example.x << ", "     // Now well defined
    << *example.y << ", "     // same
    << *example.z << "\n";    // same

Here goes nothing. The output from one of my runs is:

565, 1961.14, 4.59163e-41
565, 2, 3

Ok, at least now the x, y and z values are valid, but we are still getting junk values when dereferencing parts of example.ex. What gives? Let's go back to the declaration of our union and think how it translates to its binary representation. Here is a rough diagram:

[float* x, float* y, float* z]

So our union's memory layout is three floating point pointers, that each point to a single floating point value (equivalent to an array that stores three floating point pointers eg. float* arr[3]). Yet, with example.ex we're trying to interpret the float* x as an array of 3 floating points. This is because CustomDataTypeExample's memory layout is equivalent to an array of 3 floating point values and trying to refer to its members is equivalent to array indexing.

I think gcc's extension bases its interpretation of example->ex on C90 standard section 6.5.2.2 footnote 82:

If the member used to access the contents of a union object is not the same as the member last used tostore a value in the object, the appropriate part of the object representation of the value is reinterpretedas an object representation in the newtype as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

We can also verify this by looking at how the compiler translates these three lines to assembly:

example.x = &value;

std::cout 
    << example.ex->x << ", " 
    << example.ex->y << ", " 
    << example.ex->z << "\n";

Using godbolt we get the following (I only took the parts that are relevant):

// Copies the value of rax to the memory pointed by QWORD PTR [rbp-48]
mov     QWORD PTR [rbp-48], rax  // example.x = &value;

// Copy a 32-bit value from memory address rax to eax.
// (eax register is used here to pass the value to std::cout)
// No surprises yet, as this address has a well defined floating point value (526).
mov     eax, DWORD PTR [rax]     // example.ex->x

// Not good, tries to copy a floating point value from memory address 
// [rax + 4 bytes]. Equivalent to *(&value + 1). This is gonna get 
// whatever random junk is in that part of memory.
mov     eax, DWORD PTR [rax+4]   // example.ex->y

We can see quite clearly how the compiler tries interpret the address pointed to by example.ex as region in memory that contains 3 floating point values, even though it only contains one. Hence, the first read is fine, but the second and third dereferences go very wrong.

This code is produces extremely similar assembly, which is no surprise, as the behavior is equivalent:

float* value_ptr = &value;

std::cout
    << *value_ptr << ", "    // equivalent to example.ex->x, OK
    << value_ptr[1] << ", "  // equivalent to example.ex->y, plain UB
    << value_ptr[2] << '\n'; // equivalent to example.ex->z, plain UB

This is case of undefined behavior is very similar to the very first case. The program is performing indirection through the invalid pointer values (third).

These three undefined behaviors combined caused the weird values to appear when you executed the main. Now on the solution.

Solution

First let's get minor nitpick out of the way. CustomDataTypeExample is clearly an aggregate that just encloses data inside it, so there is no need to explicitly declare special member functions for it (constructors in this case). The special member functions are implicitly declared (and trivial):

struct CustomDataTypeExample {
    float x;
    float y;
    float z;
};

// Construct an instance of CustomDataTypeExample by aggregate initializing.
// This was also utilized earlier.
CustomDataTypeExample data{1.0f, 2.0f, 3.0f};

What comes to the solution, it looks like you are trying to come up with an extra layer of abstraction for a simple problem. Plain references should do the trick. There is no reason for that complicated union setup, which, as you might have noticed, is quite error-prone. In C++ unions should only really be utilized for reducing memory consumption on systems, where memory is a scarce resource.

Thus, I would just get rid of the ConfigCustomDataTypeExample and utilize references like so:

CustomDataTypeExample data{1.0f, 2.0f, 3.0f};
CustomDataTypeExample& data_ref = data;

// Modifies the contents of the existing data
data_ref.x = 565;

std::cout 
    << data_ref.x << ", " 
    << data_ref.y << ", " 
    << data_ref.z << '\n';

When you are working with variables that have an automatic storage duration, references are the way to go. Compared to pointers, with references lifetime issues are a little bit harder to create, and the overall solution is usually simpler.

C++ Pointers break when changing a pointer

1 Answers1

Deeper analysis

Solution