1

On the slide 6 at Rust for C++ programmers, there is this code:

#include <iostream>
#include <vector>
#include <string>

using namespace std;

int main()
{
    vector<string> v;
    v.push_back("Hello");

    string& x = v[0];
    v.push_back("world");

    cout << x << endl;
    return 0;
}

Running it I got:

g++ --std=c++11 main.cpp -I . -o main
./main
P▒▒o▒Y ▒▒2.▒8/.▒H/.▒H/.▒X/.▒X/.▒h/.▒h/.▒x/.▒x/.▒▒/.
@▒▒
...

And it keeps going for much more stuff. I found some question about aliases and vectors as:

  1. int vs const int&
  2. Changing things from a vector

But I could not figure out why the alias is not working based on them. I looked over the http://en.cppreference.com/w/cpp/container/vector, about the vector definition, however it does just seem to be continue memory allocated on the disk. I understand the string Hello and world are allocated somewhere on the data member of the program, as on the assembly here by g++ main.cpp -S:

...
.lcomm _ZStL8__ioinit,1,1
    .def    __main; .scl    2;  .type   32; .endef
    .section .rdata,"dr"
.LC0:
    .ascii "Hello\0"
.LC1:
    .ascii "world\0"
    .text
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
...

If I do not push the second element world, the program correctly runs. Therefore why the alias is loosing the reference to the first vector element after the second push?

Community
  • 1
  • 1
Evandro Coan
  • 8,560
  • 11
  • 83
  • 144
  • Look at the page for `push_back`: http://en.cppreference.com/w/cpp/container/vector/push_back – chris Mar 05 '17 at 02:04
  • 1
    Wait, the very slides you were reading explain it. – chris Mar 05 '17 at 02:06
  • Sorry, it is the first thing said on the `push_back` documentation. `If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated.` The real question would be why they do such thing. – Evandro Coan Mar 05 '17 at 02:19
  • How would you implement a data structure that has one chunk of contiguous memory for all of the elements without doing that? You could imagine tricks, but at that point, you lose the gains you get from a vector in the first place. – chris Mar 05 '17 at 02:22
  • Yes, this is the problem. Usually you cannot do it, because this is how the memory is allocated. For example, if I am adding a new element to the end of the vector and I need to increase its size, I would need to get lucky that after the end of the vector there is free memory. I would bet this is never the case, therefore a complete/whole new chunk of memory is needed to be allocated elsewhere within a hole big enough to hold the whole structure. Moreover the operation of allocating more space is extremely costly O(n) accordingly to the current size of the dynamic array. – Evandro Coan Mar 05 '17 at 02:32
  • And yet that O(n) reallocation amortizes to a constant time `push_back`. It's also extremely cache friendly, unexpectedly outperforming a linked list in many cases. That's what you get for this invalidation. The difference with Rust is that the language is fundamentally built to handle lifetime errors at compile-time. – chris Mar 05 '17 at 02:41

3 Answers3

4

When the method push_back was called the vector can reallocate the used memory and as result the reference becomes invalid.

You could reserve enough memory before adding new elements to the vector. In this case the reference will be valid. For example

vector<string> v;
v.reserve( 2 );

v.push_back("Hello");

string& x = v[0];
v.push_back("world");

Here is a demonstrative program

#include <iostream>
#include <vector>
#include <string>

int main() 
{
    std::vector<std::string> v;
    v.reserve( 2 );

    v.push_back("Hello");

    std::string& x = v[0];
    v.push_back("world");

    std::cout << x << ' ' << v[1];
    std::cout << std::endl;

    return 0;
}

Its output is

Hello world 
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
3

When you do the second push_back iterators and references should be assumed to be invalidated. The vector may probably resize its data block - most likely at another memory location.

As such the variable reference x is referencing unallocated memory which subsequently leads to undefined behavior.

Lucinda Rigetti
  • 427
  • 3
  • 10
2

push_back() resizes the vector (that's intrinsic in adding an element to it).

That invalidates all iterators, pointers, and references that refer to elements of that vector.

Accessing elements of a vector through an invalidated iterator, pointer, or reference (i.e. that were valid before the resizing operation, but not after) gives undefined behaviour.

x is invalidated by the call of push_back() that occurs after its initialisation, and before the output statement

Peter
  • 35,646
  • 4
  • 32
  • 74