2

I did a bit of an experiment to try to understand references in C++:

#include <iostream>
#include <vector>
#include <set>

struct Description {
  int a = 765;
};

class Resource {
public:
  Resource(const Description &description) : mDescription(description) {}

  const Description &mDescription;
};

void print_set(const std::set<Resource *> &resources) {
    for (auto *resource: resources) {
        std::cout << resource->mDescription.a << "\n";
    }
}

int main() {
  std::vector<Description> descriptions;
  std::set<Resource *> resources;

  descriptions.push_back({ 10 });
  resources.insert(new Resource(descriptions.at(0)));

  // Same as description (prints 10)
  print_set(resources);

  // Same as description (prints 20)
  descriptions.at(0).a = 20;
  print_set(resources);

  // Why? (prints 20)
  descriptions.clear();
  print_set(resources);

  // Object is written to the same address (prints 50)
  descriptions.push_back({ 50 });
  print_set(resources);

  // Create new array
  descriptions.reserve(100);

  // Invalid address
  print_set(resources);

  for (auto *res : resources) {
      delete res;
  }
  
  return 0;
}

https://godbolt.org/z/TYqaY6Tz8

I don't understand what is going on here. I have found this excerpt from C++ FAQ:

Important note: Even though a reference is often implemented using an address in the underlying assembly language, please do not think of a reference as a funny looking pointer to an object. A reference is the object, just with another name. It is neither a pointer to the object, nor a copy of the object. It is the object. There is no C++ syntax that lets you operate on the reference itself separate from the object to which it refers.

This creates some questions for me. So, if reference is the object itself and I create a new object in the same memory address, does this mean that the reference "becomes" the new object? In the example above, vectors are linear arrays; so, as long as the array points to the same memory range, the object will be valid. However, this becomes a lot trickier when other data sets are being used (e.g sets, maps, linked lists) because each "node" typically points to different parts of memory.

Should I treat references as undefined if the original object is destroyed? If yes, is there a way to identify that the reference is destroyed other than a custom mechanism that tracks the references?

Note: Tested this with GCC, LLVM, and MSVC

Gasim
  • 7,615
  • 14
  • 64
  • 131
  • 3
    It depends how you replace the object. With placement `new` old references refer to the new object (in most cases). If you `clear()` and `push_back()` it is technically Undefined Behavior as `clear()` invalidates all references to the elements, even though it will very likely look like it works every time you try it. – François Andrieux Mar 18 '22 at 14:29
  • 3
    "A reference is the object" is sloppy langauge, though imho it is better than thinking of references as pointers. A reference isnt really the object, but you can think of it like that as long as the object is alive, then the reference is dangling – 463035818_is_not_an_ai Mar 18 '22 at 14:30
  • related/dupe: https://stackoverflow.com/questions/6438086/iterator-invalidation-rules-for-c-containers – NathanOliver Mar 18 '22 at 14:30
  • still not perfectly accurate, but maybe better "a valid reference is the object" . – 463035818_is_not_an_ai Mar 18 '22 at 14:31
  • 2
    "Should I treat references as undefined if the original object is destroyed?" Yes. "is there a way to identify that the reference is destroyed" No. – Quimby Mar 18 '22 at 14:32
  • Thank you all for the replies. My main problem was understanding the validity of references but it is much clearer now. – Gasim Mar 18 '22 at 14:37
  • IMO the note is more misleading than clarifying. You might find it easier to understand if you forgot about it. – Passer By Mar 18 '22 at 14:47
  • @PasserBy is there another reference book that I can refer to about specification for references? – Gasim Mar 18 '22 at 14:47
  • 1
    @Gasim I don't know of a good book to learn specifically about references. But you might want to read [cppreference](https://en.cppreference.com/w/cpp/language/reference). – Passer By Mar 18 '22 at 14:49
  • Thank you, this is what I was looking for. There is even a section about dangling references. – Gasim Mar 18 '22 at 14:56
  • normal name can also be invalidated, for example when you explicitly call it's destructor. the alias name becomes invalid the same time as the aliased object. – apple apple Mar 18 '22 at 15:03
  • oh and dangling reference is (usually) found in return local variable by reference, which doesn't need to have anything to do with pointer. – apple apple Mar 18 '22 at 15:05
  • you can also see https://en.cppreference.com/w/cpp/language/lifetime – apple apple Mar 18 '22 at 15:09

2 Answers2

4

The note is misleading, treating references as syntax sugar for pointers is fine as a mental model. In all the ways a pointer might dangle, a reference will also dangle. Accessing dangling pointers/references is undefined behaviour (UB).

int* p = new int{42};
int& i = *p;
delete p;

void f(int);
f(*p); // UB
f(i);  // UB, with the exact same reason

This also extends to the standard containers and their rules about pointer/reference invalidation. The reason any surprising behaviour happens in your example is simply UB.

Passer By
  • 19,325
  • 6
  • 49
  • 96
1

The way I explain this to myself is:

Pointer is like a finger on your hands. It can point to memory blocks, think of them as a keyboard. So pointer literally points to a keypad that holds something or does something.

Reference is a nickname for something. Your name may be for example Michael Johnson, but people may call you Mike, MJ, Mikeson etc. Anytime you hear your nickname, person who called REFERED to the same thing - you. If you do something to yourself, reference will show the change too. If you point at something else, it won't affect what you previously pointed on (unless you're doing something weird), but rather point on something new. That being said, as in the accepted answer above, if you do something weird with your fingers and your nicknames, you'll see weird things happening.

References are likely the most important feature that C++ has that is critical in coding for beginners. Many schools today start with MATLAB which is insanely slow when you wish to do things seriously. One of the reasons is the lack of controlling references in MATLAB (yes it has them, make a class and derive from the handle - google it out) as you would in C++.

Look these two functions:

double fun1(std::valarray<double> &array) 
{ 
    return array.max();
}
double fun2(std::valarray<double> array)
{
     return array.max();
}

These simple two functions are very different. When you have some STL array and use fun1, function will expect nickname for that array, and will process it directly without making a copy. fun2 on the other hand will take the input array, create its copy, and process the copy.

Naturally, it is much more efficient to use references when making functions to process inputs in C++. That being said, you must be certain not to change your input in any way, because that will affect original input array in another piece of code where you generated it - you are processing the same thing, just called differently. This makes references useful for a bit controversial coding, called side-effects. In C++ you can't make a function with multiple outputs directly without making a custom data type. One workaround is a side effect in example like this:

#include <stdio.h>
#include <valarray>
#include <iostream>

double fun3(std::valarray<double> &array, double &min)
{
    min = array.min();
    return array.max();
}
int main()
{
    std::valarray<double> a={1, 2, 3, 4, 5};
    double sideEffectMin;
    double max = fun3(a,sideEffectMin);
    std::cout << "max of array is " << max << " min of array is " << 
    sideEffectMin<<std::endl;
    return 0;
}

So fun3 is expecting a reference to a double data type. In other words, it wants the second input to be a nickname for another double variable. This function then goes to alter the reference, and this will also alter the input. Both name and nickname get altered by the function, because it's the same "thing". In main function, variable sideEffectMin is initialized to 0, but it will get a value when fun3 function is called. Therefore, you got 2 outputs from fun3.

The example shows you the trick with side effect, but also to be ware not to alter your inputs, specially when they are references to something else, unless you know what you are doing.