14

I was just wondering, how references are internally stored? I felt that an understanding deep down that level will make me understand the concept pointer vs reference better and making decision choices.

I suspect that it basically works same as pointers but the compiler takes care of handling pointers. Please advise.

howtechstuffworks
  • 1,824
  • 4
  • 29
  • 46
  • 6
    "I felt that an understanding deep down that level will make me understand the concept pointer vs reference better" I don't think that would help any. – Mr.Anubis Feb 18 '12 at 14:57
  • 1
    You can try this: http://eetimes.com/discussion/programming-pointers/4023307/References-vs-Pointers – Vlad Feb 18 '12 at 14:58
  • @MR.Anubis..... what do you mean? – howtechstuffworks Feb 18 '12 at 16:02
  • well one thing to know, when compiler creates the symbol table, It has two symbols referring to memory block object i.e `int x; int& y=x;` , here `x` and `y` are those two symbols. but it's compiler dependent , they might use pointer to implement reference. I'm just 60% sure about my theory :) – Mr.Anubis Feb 18 '12 at 16:22
  • Yeah exactly, this what I felt, I feel lot more clearer, when I think in terms of reference as another name. and btw 60% is lot better for me... lol – howtechstuffworks Feb 18 '12 at 18:07
  • 1
    Pointer and reference are a way to access remote/different memory location. With pointers this is explicit. You know you are pointing to something and hence must be careful. With references you, the compiler handles the internals for you. So pointer->explicit reference->implicit. There are much more differences between both of them but this is the basic difference. – prathmesh.kallurkar Feb 19 '12 at 05:51
  • 1
    Possible duplicate of [How C++ reference works](https://stackoverflow.com/q/7418483/608639) and [How is reference implemented internally?](https://stackoverflow.com/q/3954764/608639) – jww Nov 05 '18 at 09:58
  • @tripleee - Can you look this over for closing? – jww Nov 05 '18 at 09:59

3 Answers3

16

There's no requirement that a reference be "stored" in any way at all. As far as the language is concerned, a reference is just an alias of some existing object, and that's all that any compiler must provide.

It's entirely possible that there's no need to store anything at all if the reference is just a short-hand for some other object that's already in scope, or if a function with a reference argument gets inlined.

In situations where the reference needs to be made manifest (e.g. when calling a function in a different translation unit), you can practically implement a T & x as a T * const and treat every occurrence of x as implicitly dereferencing that pointer. Even on a higher level you can think of T & x = y; and T * const p = &y; (and correspondingly of x and *p) as essentially equivalent, so this would be an obvious way to implement references.

But of course there's no requirement, and any implementation is free to do whatever it wants.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • I see, From you post, I think you are saying, that for using reference, there is no need to allocate any extra variable, u can just add the name to the table of variables you are maintaining(its my assumption though). I dont know, how c++ works now, but if I implement a new oop language, I can still make sure, that no extra variable are allocated in memory, just remember the variable 'a' has another name called "a_ref"???? Did I get it right? – howtechstuffworks Feb 18 '12 at 15:23
  • 1
    @howtechstuffworks: As long as the reference only refers to local or member scope variables, then yes. – Puppy Feb 19 '12 at 03:26
13

References are just aliases internally the compiler treats them same as pointers.

But for the user from usage perspective there are several subtle differences.

Some of the major differences are:

  • Pointers can be NULL while references cannot.There is nothing called as NULL reference.
  • A const reference extends the lifetime of a temporary bound to it. There's no equivalent with pointers.

In addition, references have some things in common with const pointers (not a pointer to const):

  • References must be initialized at time of creation.
  • A reference is permanently bound to a single storage location, and cannot later be rebound.

When you know you have something(an object) to refer to and you'll never want to refer to anything else use a Reference else use pointers.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
Alok Save
  • 202,538
  • 53
  • 430
  • 533
  • It's not quite the same as pointers (not always). The compiler is allowed to elide references completely, for example in `class A { public: A(): a(_a) {} int const& a; private: int _a; };`, the reference can be eliminated (and may thus not influence the size of the object) while a pointer *cannot* be elided. – Matthieu M. Feb 18 '12 at 16:45
  • "while references cannot" `int& a = *((int*)0);` – SigTerm Feb 18 '12 at 17:17
  • 5
    @SigTerm If you aim carefully, you can shoot yourself in the foot. – Captain Giraffe Feb 18 '12 at 17:21
  • 2
    @SigTerm That's undefined behavior anyway. – Etienne de Martel Feb 18 '12 at 17:21
  • 1
    @SigTerm: References cannot be `NULL` period. Anywhere, anytime you write a code which does that invokes Undefined Behavior and all safe bets are off, and then You are not in C++ land anymore you are in Zombie land. – Alok Save Feb 18 '12 at 17:24
  • 1
    @SigTerm: what about it? You can also dereference a null pointer without the compiler complaining. That doesn't make it well-defined C++. Creating a null reference is undefined behavior. It (probably) won't be caught as a compile-time error, but you're nevertheless doing things that are outside the C++ standard, and so, your code is not C++. C++ does not have a notion of null pointers – jalf Feb 18 '12 at 17:27
  • @EtiennedeMartel: Interesting. Care to prove that `int &a = *((int*)0), &b = *((int*)0); bool c = (&a == &b);/*c == true*/` is UB? Not much of a difference from `int tmp = 0, &a = tmp, &b = tmp; bool c = (&a == &b);` – SigTerm Feb 19 '12 at 00:50
  • 4
    @SigTerm I have nothing to prove, just go read the Standard. The problem is in the `*((int*)0)`: you're dereferencing a null pointer. You're lucky that it doesn't crash right away. If it _doesn't_ crash, then yeah, you can always bind it to a reference, but that doesn't change the fact that dereferencing the pointer in the first place was UB. – Etienne de Martel Feb 19 '12 at 03:23
  • @SigTerm Comments are not appropriate for extended discussions. If you still want to discuss this, come to the [Lounge](http://chat.stackoverflow.com/rooms/10/loungec). – Etienne de Martel Feb 19 '12 at 16:16
  • @EtiennedeMartel: "come to the" Not interested. However, next time you might want to provide citation or specify which section of standard supports your argument (dcl.ref (8.3.2) part 4 of iec 14882 2003) instead of saying "read the standard". This way there will be no extended discussions in comments. Saying "read the standard" without specifying section give strong impression that you haven't read it yourself and simply repeating information you *heard* but haven't verified yourself. Just saying. – SigTerm Feb 19 '12 at 17:44
  • 2
    @SigTerm I did not think that I actually had to cite the standard for something as obvious as this. – Etienne de Martel Feb 19 '12 at 19:10
  • 2
    @SigTerm: Please stop arguing over a non-existent silly issues.You added a argument which was fundamentally incorrect and a no of users told you the reasoning why it is incorrect,if you want to still contest it's validity you should come up with a counter quote from the standard which proves your point, if not you should accept the mistake learn from it and move on,IMO that should be the whole purpose of being here,learning & improving oneself while trying to help others.Just arguing for the sake of argument is useless.please grow up or if you are already grown up, please change the attitude. – Alok Save Feb 19 '12 at 19:43
  • @Als: I'd advise to stop jumping to conclusions. I was interesting if people that love saying "null dereference is UB" have actually checked that themselves or were blindly repeating after somebody else. My attitude is none of your business, and if you do not like this kind of discussion, do not participate. For behavior you find inappropriate there are "report" flags, so I'd advise to use them. – SigTerm Feb 19 '12 at 21:48
2

Sorry for using assembly to explain this but I think this is the best way to understand how references are implemented by compilers.

#include <iostream>

using namespace std;

int main()
{
    int i = 10;
    int *ptrToI = &i;
    int &refToI = i;

    cout << "i = " << i << "\n";
    cout << "&i = " << &i << "\n";

    cout << "ptrToI = " << ptrToI << "\n";
    cout << "*ptrToI = " << *ptrToI << "\n";
    cout << "&ptrToI = " << &ptrToI << "\n";

    cout << "refToNum = " << refToI << "\n";
    //cout << "*refToNum = " << *refToI << "\n";
    cout << "&refToNum = " << &refToI << "\n";

    return 0;
}

Output of this code is like this

i = 10
&i = 0xbf9e52f8
ptrToI = 0xbf9e52f8
*ptrToI = 10
&ptrToI = 0xbf9e52f4
refToNum = 10
&refToNum = 0xbf9e52f8

Lets look at the disassembly(I used GDB for this. 8,9 and 10 here are line numbers of code)

8           int i = 10;
0x08048698 <main()+18>: movl   $0xa,-0x10(%ebp)

Here $0xa is the 10(decimal) that we are assigning to i. -0x10(%ebp) here means content of ebp register –16(decimal). -0x10(%ebp) points to the address of i on stack.

9           int *ptrToI = &i;
0x0804869f <main()+25>: lea    -0x10(%ebp),%eax
0x080486a2 <main()+28>: mov    %eax,-0x14(%ebp)

Assign address of i to ptrToI. ptrToI is again on stack located at address -0x14(%ebp), that is ebp – 20(decimal).

10          int &refToI = i;
0x080486a5 <main()+31>: lea    -0x10(%ebp),%eax
0x080486a8 <main()+34>: mov    %eax,-0xc(%ebp)

Now here is the catch! Compare disassembly of line 9 and 10 and you will observer that ,-0x14(%ebp) is replaced by -0xc(%ebp) in line number 10. -0xc(%ebp) is the address of refToNum. It is allocated on stack. But you will never be able to get this address from you code because you are not required to know the address.

So; a reference does occupy memory. In this case it is the stack memory since we have allocated it as a local variable. How much memory does it occupy? As much a pointer occupies.

Now lets see how we access the reference and pointers. For simplicity I have shown only part of the assembly snippet

16          cout << "*ptrToI = " << *ptrToI << "\n";
0x08048746 <main()+192>:        mov    -0x14(%ebp),%eax
0x08048749 <main()+195>:        mov    (%eax),%ebx
19          cout << "refToNum = " << refToI << "\n";
0x080487b0 <main()+298>:        mov    -0xc(%ebp),%eax
0x080487b3 <main()+301>:        mov    (%eax),%ebx

Now compare the above two lines, you will see striking similarity. -0xc(%ebp) is the actual address of refToI which is never accessible to you. In simple terms, if you think of reference as a normal pointer, then accessing a reference is like fetching the value at address pointed to by the reference. Which means the below two lines of code will give you the same result

cout << "Value if i = " << *ptrToI << "\n";
cout << " Value if i = " << refToI << "\n";

Now compare this

15          cout << "ptrToI = " << ptrToI << "\n";
0x08048713 <main()+141>:        mov    -0x14(%ebp),%ebx
21          cout << "&refToNum = " << &refToI << "\n";
0x080487fb <main()+373>:        mov    -0xc(%ebp),%eax

I guess you are able to spot what is happening here. If you ask for &refToI, the contents of -0xc(%ebp) address location are returned and -0xc(%ebp) is where refToi resides and its contents are nothing but address of i.

One last thing, Why is this line commented?

//cout << "*refToNum = " << *refToI << "\n";

Because *refToI is not permitted and it will give you a compile time error.

Prasad Rane
  • 691
  • 8
  • 16