46

I'm just wondering how references are actually implemented across different compilers and debug/release configurations. Does the standard provide recommendations on their implementation? Do implementations differ?

I tried to run a simple program where I return non-const references and pointers to local variables from functions, but they worked out the same way. Does this mean that references are internally just a pointer?

Justin
  • 24,288
  • 12
  • 92
  • 142
Keynslug
  • 2,676
  • 1
  • 19
  • 20
  • 3
    Totally implementation-defined. The standard says whether a reference uses storage or not is unspecified. Likely, a pointer is easiest, but many times references can be removed altogether. – GManNickG Oct 17 '10 at 19:04

7 Answers7

57

Just to repeat some of the stuff everyone's been saying, lets look at some compiler output:

#include <stdio.h>
#include <stdlib.h>

int byref(int & foo)
{
  printf("%d\n", foo);
}
int byptr(int * foo)
{
  printf("%d\n", *foo);
}

int main(int argc, char **argv) {
  int aFoo = 5; 
  byref(aFoo);
  byptr(&aFoo);
}

We can compile this with LLVM (with optimizations turned off) and we get the following:

define i32 @_Z5byrefRi(i32* %foo) {
entry:
  %foo_addr = alloca i32*                         ; <i32**> [#uses=2]
  %retval = alloca i32                            ; <i32*> [#uses=1]
  %"alloca point" = bitcast i32 0 to i32          ; <i32> [#uses=0]
  store i32* %foo, i32** %foo_addr
  %0 = load i32** %foo_addr, align 8              ; <i32*> [#uses=1]
  %1 = load i32* %0, align 4                      ; <i32> [#uses=1]
  %2 = call i32 (i8*, ...)* @printf(i8* noalias getelementptr inbounds ([4 x i8]* @.str, i64 0, i64 0), i32 %1) ; <i32> [#uses=0]
  br label %return

return:                                           ; preds = %entry
  %retval1 = load i32* %retval                    ; <i32> [#uses=1]
  ret i32 %retval1
}

define i32 @_Z5byptrPi(i32* %foo) {
entry:
  %foo_addr = alloca i32*                         ; <i32**> [#uses=2]
  %retval = alloca i32                            ; <i32*> [#uses=1]
  %"alloca point" = bitcast i32 0 to i32          ; <i32> [#uses=0]
  store i32* %foo, i32** %foo_addr
  %0 = load i32** %foo_addr, align 8              ; <i32*> [#uses=1]
  %1 = load i32* %0, align 4                      ; <i32> [#uses=1]
  %2 = call i32 (i8*, ...)* @printf(i8* noalias getelementptr inbounds ([4 x i8]* @.str, i64 0, i64 0), i32 %1) ; <i32> [#uses=0]
  br label %return

return:                                           ; preds = %entry
  %retval1 = load i32* %retval                    ; <i32> [#uses=1]
  ret i32 %retval1
}

The bodies of both functions are identical

SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
24

Sorry for using assembly to explain this but I think this is the best way to understand how references are implemented by compilers.

    #include <iostream>

    using namespace std;

    int main()
    {
        int i = 10;
        int *ptrToI = &i;
        int &refToI = i;

        cout << "i = " << i << "\n";
        cout << "&i = " << &i << "\n";

        cout << "ptrToI = " << ptrToI << "\n";
        cout << "*ptrToI = " << *ptrToI << "\n";
        cout << "&ptrToI = " << &ptrToI << "\n";

        cout << "refToNum = " << refToI << "\n";
        //cout << "*refToNum = " << *refToI << "\n";
        cout << "&refToNum = " << &refToI << "\n";

        return 0;
    }

Output of this code is like this

    i = 10
    &i = 0xbf9e52f8
    ptrToI = 0xbf9e52f8
    *ptrToI = 10
    &ptrToI = 0xbf9e52f4
    refToNum = 10
    &refToNum = 0xbf9e52f8

Lets look at the disassembly(I used GDB for this. 8,9 and 10 here are line numbers of code)

8           int i = 10;
0x08048698 <main()+18>: movl   $0xa,-0x10(%ebp)

Here $0xa is the 10(decimal) that we are assigning to i. -0x10(%ebp) here means content of ebp register –16(decimal). -0x10(%ebp) points to the address of i on stack.

9           int *ptrToI = &i;
0x0804869f <main()+25>: lea    -0x10(%ebp),%eax
0x080486a2 <main()+28>: mov    %eax,-0x14(%ebp)

Assign address of i to ptrToI. ptrToI is again on stack located at address -0x14(%ebp), that is ebp – 20(decimal).

10          int &refToI = i;
0x080486a5 <main()+31>: lea    -0x10(%ebp),%eax
0x080486a8 <main()+34>: mov    %eax,-0xc(%ebp)

Now here is the catch! Compare disassembly of line 9 and 10 and you will observer that ,-0x14(%ebp) is replaced by -0xc(%ebp) in line number 10. -0xc(%ebp) is the address of refToNum. It is allocated on stack. But you will never be able to get this address from you code because you are not required to know the address.

So; a reference does occupy memory. In this case it is the stack memory since we have allocated it as a local variable. How much memory does it occupy? As much a pointer occupies.

Now lets see how we access the reference and pointers. For simplicity I have shown only part of the assembly snippet

16          cout << "*ptrToI = " << *ptrToI << "\n";
0x08048746 <main()+192>:        mov    -0x14(%ebp),%eax
0x08048749 <main()+195>:        mov    (%eax),%ebx
19          cout << "refToNum = " << refToI << "\n";
0x080487b0 <main()+298>:        mov    -0xc(%ebp),%eax
0x080487b3 <main()+301>:        mov    (%eax),%ebx

Now compare the above two lines, you will see striking similarity. -0xc(%ebp) is the actual address of refToI which is never accessible to you. In simple terms, if you think of reference as a normal pointer, then accessing a reference is like fetching the value at address pointed to by the reference. Which means the below two lines of code will give you the same result

cout << "Value if i = " << *ptrToI << "\n";
cout << " Value if i = " << refToI << "\n";

Now compare this

15          cout << "ptrToI = " << ptrToI << "\n";
0x08048713 <main()+141>:        mov    -0x14(%ebp),%ebx
21          cout << "&refToNum = " << &refToI << "\n";
0x080487fb <main()+373>:        mov    -0xc(%ebp),%eax

I guess you are able to spot what is happening here. If you ask for &refToI, the contents of -0xc(%ebp) address location are returned and -0xc(%ebp) is where refToi resides and its contents are nothing but address of i.

One last thing, Why is this line commented?

//cout << "*refToNum = " << *refToI << "\n";

Because *refToI is not permitted and it will give you a compile time error.

Prasad Rane
  • 691
  • 8
  • 16
22

The natural implementation of a reference is indeed a pointer. However, do not depend on this in your code.

Mark Wilkins
  • 40,729
  • 5
  • 57
  • 110
Peter G.
  • 14,786
  • 7
  • 57
  • 75
  • @Martin York, one could depend on sizeof(T*)==sizeof(T&) – Peter G. Oct 17 '10 at 22:45
  • 9
    @Peter G: That will not work as you expect. As references are aliases the RHS is actually getting the size of T (as a T& is an alias to a T). Thus it only holds when sizeof(T) == sizeof(void*). Try printing the `sizeof(char&)` it will return 1. Aliases do not introduce a new variable they introduce a new name for a variable (so they may not even need pointers to be implemented (if the variable and the reference are in the same scope). – Martin York Oct 18 '10 at 00:04
  • Missed a not from original comment: -> As you can't do anything with a reference knowing its implementation will not hurt. You can **NOT** get the address of a reference you can **NOT** modify the reference itself (only what it refers to). There is no way to interact with a reference. – Martin York Oct 18 '10 at 00:08
  • 3
    There is no standard-conforming way to directly depend on it. That is clear also to me. What I meant so say in my above comment was that the size of the storage of references is observable and has an effect. When you have a struct containing a reference member a and another struct that has a pointer member a instead and no padding or alignment issues the size of the struct will change if references had a different size. ... – Peter G. Oct 18 '10 at 06:41
  • ... Perhaps I have an allocator that is much more efficient for 12 byte structs than for 16 byte structs. The difference in performance then is observable. – Peter G. Oct 18 '10 at 06:43
  • The size of storage necessary for a reference is situation-dependent. Even if you knew that a `T&` takes 4 bytes in `struct A;` you couldn't say for certain that it would take 4 bytes in `struct B;` – MSalters Oct 19 '10 at 09:43
  • 4
    The standard says whether a reference uses storage or not is unspecified. Sometimes, reference is just something that exists only in compile-time! – Gab是好人 Sep 14 '16 at 12:37
13

In Bjarne's words:

Like a pointer, a reference is an alias for an object, is usually implemented to hold a machine address of an object, and does not impose performance overhead compared to pointers, but it differs from a pointer in that:

• You access a reference with exactly the same syntax as the name of an object.

• A reference always refers to the object to which it was initialized.

• There is no ‘‘null reference,’’ and we may assume that a reference refers to an object


Though a reference is in reality a pointer, but it shouldn't be used like a pointer but as an alias.

Saurav Sahu
  • 13,038
  • 6
  • 64
  • 79
2

Reference is not pointer. This is fact. Pointer can bind to another object, has its own operations like dereferencing and incrementing / decrementing.

Although internally, reference may be implemented as a pointer. But this is an implementation detail which does not change the fact that references cannot be interchanged with pointers. And one cannot write code assuming references are implemented as pointers.

user2615724
  • 164
  • 1
  • 2
2

I can't say this is right for sure, but I did some Googling and found this statement:

The language standard does not require any particular mechanism. Each implementation is free to do it in any way, as long as the behavior is compliant.

Source: Bytes.com

Mike
  • 19,267
  • 11
  • 56
  • 72
2

There is no need a reference to be a pointer. In many cases, it is, but in other cases it is just an alias and there is no need of separate memory allocation for a pointer. assembly samples are not always correct, because they depend heavily on optimizations and how "smart" is the compiler.

for example: int i; int& j = i;

does not need to generate any additional code or allocate any additional memory.

kamerunka
  • 106
  • 5