3

It would seem, a reference is simply an alias, yet, adding reference-fields to a struct, for example, increases the structure's size even when the reference is initialized at declaration as an alias for another field of the same structure.

For example:


#include <iostream>

using namespace std;

int
main(int, char **)
{
    struct {
        int integers[2];
    } first;
    struct {
        int integers[2];
        int &one = integers[0];
        int &two = integers[1];
    } second;

    cout << sizeof first << " " << sizeof first.integers << " " <<
        sizeof second << " " << endl;

    return 0;
};

The above program prints: 8 8 24 here. The first two numbers I understand, the third -- no. Why does adding such references matter -- what is stored in that memory, that cannot be resolved at compile time? Unlike pointers, once declared, references cannot change by design anyway, can they? So why are they being stored?

Mikhail T.
  • 3,043
  • 3
  • 29
  • 46
  • While it probably could be resolved at compile-time, it doesn't look like compilers are programmed with this kind of optimization yet. Usually, a reference data member will be implemented as if it were a `T * const` member. – François Andrieux Dec 22 '20 at 15:55
  • Consider that reference data members aren't always so easy to figure out. Consider what happens if you add a constructor which assigns some other object to those references. The compiler needs to be able to figure out whether or not this may happen before it could eliminate those members. – François Andrieux Dec 22 '20 at 15:58
  • Considering some other (quite amazing) optimizations already implemented, this one seems rather simple... And yet, even clang-10 does not have it... – Mikhail T. Dec 22 '20 at 16:02
  • 1
    I expect it is either a hard optimization, or a situation that doesn't happen often enough in real code to make it worth while. However, both clang and gcc are open source and either would be happy to accept a patch to optimize this use-case scenario. – Eljay Dec 22 '20 at 16:04
  • There is nothing in the standard that I know of that prevents this optimization, but I might have missed it. What seems likely to me is that there is not a lot of demand for this optimization, so nobody has invested the time in making it work. It may be a chicken/egg problem where nobody uses reference members until an optimization is implemented, but since nobody is using them now, the optimization does not seem important. – François Andrieux Dec 22 '20 at 16:08
  • Also consider that even if the optimization was implemented in *some* compilers, a lot of people would not use it if they couldn't rely on it on all platforms. It would probably require an amendment to the standard to get work on this optimization going, even it can be formalized. – François Andrieux Dec 22 '20 at 16:09
  • Consider a source file (translation unit) `f.cpp` with the following content: `void f(int& i) { i++; }`. When you write `g++ -c f.cpp`, the compiler has absolutely no way how to figure out what `i` will be bound to. It therefore internally requires the address of the bound object (that is a pointer) to be passed at the machine code level. Live demo: https://godbolt.org/z/K5T8cz. – Daniel Langr Dec 22 '20 at 16:15
  • @DanielLangr, your example is quite different from mine - where the references are initialized at declaration. Initialized as aliases to other elements of _the same structure_. – Mikhail T. Dec 22 '20 at 16:17
  • 1
    @MikhailT. I thought your question was generic. Anyway, what if you initialize the member reference manually? Simplified demo: https://godbolt.org/z/3MxjKe. I don't think that the byte size of objects of the same type can depend on their initialization form. – Daniel Langr Dec 22 '20 at 16:27
  • @DanielLangr, the `a` and the `r` in your example are still the same thing -- they cannot have different values, and the `r` cannot point to anything other than `a`. It may be a useful alias at compile time, but why store it at runtime? – Mikhail T. Dec 22 '20 at 17:19
  • 1
    @MikhailT. In my example, `x.r` does not point to `x.a`, it points to the `a` variable local to `main`. `a` and `r` inside `x` are therefore not the same thing. I likely should have used a different name for that local variable: https://godbolt.org/z/1n3jaE. – Daniel Langr Dec 22 '20 at 17:29
  • I see, @DanielLangr -- well, then they aren't really aliases, after all. Turn your comment into an answer, so I can "accept" it. Thanks! – Mikhail T. Dec 22 '20 at 18:32
  • @MikhailT. heapunderrun was faster :). It seems that his reasoning is the same. – Daniel Langr Dec 22 '20 at 19:11

2 Answers2

2

This answer mentions that the objects might very well be optimized away, but that calling sizeof can force them to not be, as removing them would be an “observable” change in the size of the struct:

https://stackoverflow.com/a/55060982

Buddy
  • 10,874
  • 5
  • 41
  • 58
  • 1
    The Quantum Mechanics of C++... Seriously though, what's wrong with this part being observable? References do not -- should not -- have sizes (nor addresses) of their own anyway... – Mikhail T. Dec 22 '20 at 16:24
2

Even with the first and second structures being defined the way you do, I think, those reference members cannot be optimized away (if we are talking not about the particular program you wrote, but in the general case of using those structures). For example, suppose at some point in code, you'll decide to create an instance of the second structure, but to initialize the reference members differently, perhaps even in a dynamic way, not known at compile time. Consider the following usage:

#include <iostream>

int main()
{
    struct
    {
        int integers[2];
    } first;

    struct
    {
        int integers[2];
        int &one = integers[0];
        int &two = integers[1];
    } second;

    int user_choice{ 0 };
    std::cin >> user_choice;

    int i{ 56 }, j{ 78 };
    decltype(second) third{ {12, 34}, i, (user_choice < 42) ? i : j };

    std::cout << third.integers[0] << ' ' << third.integers[1] << ' '
        << third.one << ' ' << third.two << '\n';
}

I the program above, the compiler simply cannot know beforehand, whether third.two will refer to i or to j: this depends on the number entered by user at run-time (try out at https://godbolt.org/z/43bM1o by entering 7 instead of 100, for example).

heap underrun
  • 1,846
  • 1
  • 18
  • 22