5

I have the following code:

#include <vector>
#include <iostream>

std::vector <int> a;

int append(){
  a.emplace_back(0);
  return 10;
}

int main(){
  a = {0};
  a[0] = append();
  std::cout << a[0] << '\n';
  return 0;
}

The function append() as a side effect increases the vector size by one. Because of how vectors work, this can trigger a reallocation of its memory when exceeding its capacity.

So, when doing a[0] = append(), if a reallocation happens, then a[0] is invalidated and points to the old memory of the vector. Because of this you can expect the vector to end up being {0, 0} instead of {10, 0}, because it is assigning to the old a[0] instead of the new one.

The weird thing that confuses me is that this behavior changes between C++14 and C++17.

On C++14, the program will print 0. On C++17, it will print 10, meaning a[0] actually got 10 assigned to it. So, I have the following questions whose answers I couldn't find:

  • Is C++17 evaluating a[0]'s memory address after evaluating the RHS of the assignment expression? Does C++14 evaluate this before and that's why it changes?
  • Was this a bug that was fixed in C++17? What changed in the standard?
  • Is there a clean way to make this assignment behave like in C++17 when using C++14 or C++11?
cigien
  • 57,834
  • 11
  • 73
  • 112
  • 6
    related/dupe: https://stackoverflow.com/questions/38501587/what-are-the-evaluation-order-guarantees-introduced-by-c17 – NathanOliver Oct 29 '21 at 00:05
  • 8
    `Because of this you can expect`, because of that it's UB and you can't expect any particular behavior. – Kyle Oct 29 '21 at 00:20
  • @Kyle How do you know it's UB? Is this documented somewhere? It's because of the validity? – Blaz Korecic Oct 29 '21 at 00:28
  • 3
    It was undefined, or at least unspecified, but that's the big change in C++17. – user4581301 Oct 29 '21 at 00:49
  • 1
    @BlazKorecic If the subexpression `a[0]` is evaluated first, the result is UB because it's a dangling reference and you're assigning to it. Prior to C++17 evaluation order of subexpressions was largely unspecified, so there was no guarantee that `a[0]` would be evaluated before `append()`, but if it is, then your program has UB because of the dangling reference. C++17 changed the rules for evaluation order for assignment operators to be from right to left, so C++17 guarantees that `append()` is evaluated before `a[0]` and, thus, there's no dangling reference and no UB. – Kyle Oct 29 '21 at 01:21
  • 2
    In any case, **you never want** to write code as in your example as it is not intuitive. Code like `a[0] = append();` has no reason to exist in production code. Your `append` function does not respect SRP (single responsability principle) as it does 2 things (append data, return arbitrary vale). – Phil1970 Oct 29 '21 at 01:22
  • Assuming that real code is more complex and meaningful, the clean way might be: `int result = append(); a[0] = result;`. If that is the real code, then the clean way could be: `a.emplace_back(0); a[0] = 10;` and remove useless `append` function. – Phil1970 Oct 29 '21 at 01:34
  • Historically, a lot of freedom was given to compiler as it allows to generate more efficient code. However as the langage evolve, it was found that some UB cases were not easy to spot (error prone) and make code harder to write and limit possibilities. – Phil1970 Oct 29 '21 at 01:49
  • 1
    @Phil1970 SRP is not the issue here at least for the function itself I'd say, even if you're right for this particular example. You could construct a quite similar example via standard library schemes, for instance an insert with weird returned iterator self-referrencing experiments on the lhs. The core issue here is simply a violation (pre C++17) or at least a suspicious (mis-)usage of evaluation order rules (since C++17). – Secundi Oct 29 '21 at 08:08
  • @Secundi OP said "I have the following code" implying that he wrote that kind of code that is hard to read. In my opinion, it would be more benefical for OP to learn code coding practice first (avoid global variable, SRP, avoid hard-coded constant, avoid modifying tha same data twice in an expression...) as one should rarely write code that require people to have deep understanding of evaluation rules. More precise rules in C++ 17 is not an excuse to write hard to read code. One should use improved rules only if it make code better and not worst. – Phil1970 Oct 30 '21 at 03:49

3 Answers3

7

This code is UB prior to C++17, as pointed out in comments, due to C++ evaluation order rules. The basic problem: Order of operations is not order of evaluation. Even something like x++ + x++ is UB.

In C++17, sequencing rules for assignments were changed:

  1. In every simple assignment expression E1=E2 and every compound assignment expression E1@=E2, every value computation and side-effect of E2 is sequenced before every value computation and side effect of E1
qz-
  • 674
  • 1
  • 4
  • 14
3

The program has undefined behavior prior to C++17 because the evaluation order is unspecified and one of the possible choices leads to using an invalidated reference. (That’s how undefined behavior works: even if you log the two evaluations and the right hand side logs first, it might have evaluated the other way with the effect of the undefined behavior being the incorrect logging.)

While it’s an informal point, the change in C++17 to specify the order of evaluation for certain operators (including =) wasn’t considered to be a bug fix, which is why compilers don’t implement the new rules in prior language modes. (It’s the code that’s broken, not the language.)

Cleanliness is subjective, but the usual way to deal with an ordering issue like this is to introduce a temporary variable:

{  // limit scope
  auto x=append();
  a[0]=x;  // with std::move for non-trivial types
}

This occasionally interferes with passing by value to the assignment operator, which can’t be helped.

Davis Herring
  • 36,443
  • 4
  • 48
  • 76
1

While rules are now specified in more cases, one should depend on new rules only when it make the code easier to understand or more efficient or correct.

Your code has many issues:

  • You are using global variables
  • You function does 2 unrelated things
  • You have hard-coded constant in code
  • You modify the global vector twice in the same expression

More details

It is well known that one should avoid global variable. This is even worst because your variable in a single letter name (a) which make it prone to name conflict.

append function does 2 things. It append value and return an unrelated value. It would be best to have 2 separate functions:

void append_0_to_a() 
{ 
    a.emplace_back(0); 
}

// I have no idea what 10 represent in your code so I make a guess
int get_number_of_fingers() 
{ 
    const int hands = 2; 
    const int fingerPerHands = 5; 
    return hands * fingerPerHands; 
}

Main code will then be much more readable by writing

a = { 0 };
append_0_to_a();
a[0] = get_number_of_fingers();

But even then, it is not clear why you use so many syntax to modify your vector. Why not simply write something like

int main()
{
    std::vector <int> a = { get_number_of_fingers(), 0 };
    std::cout << a[0] << '\n';
    return 0;
}

By writing cleaner code, you don't need advanced knowledge of evalution order and other people that read your code will understand it more easily.

While evaluations rules were updated mainly in C++ 11 and then C++ 17, one should not abuse of those rule to write code hard to read.

Rules were improved to make code previsible in some case like when a function receive multiple std::unique_ptr in arguments. By forcing compiler to evaluate completly one argument before evaluating the other, it will make the code exception safe (no memory leak) in a case like that:

// Just a simplified example --- not real code
void f(std::unique_ptr<int> a, std::unique_ptr<int> b)
{
    ...
}

f(new 2, new 3);

Newer rules ensure that std::unique_ptr constructor for one argument is called before new call for the other argument thus preventing a possible leak if an exception is throw by second call to new.

Newer rules also ensure some ordering which is important for user defined functions and operators so that chaining calls is intuitive. As far as I know, this is useful for libraries like improved futures and other asynch stuff as on older rules there was too much undefined or unspecified bahavior.

As I have metionned in a comment, original rules allows compiler for more agressive optimisations.

An expression like i++ + i++ was essentially undefined because in the general case were the variables are differents (say i and j) a compiler can reorder instructions to make more efficient code and the compiler didn't have to consider the special case where a variable could repeat in which case the generated code might give different result depending how it was implemented.

Most of the original rules are based on C which does not support operator overloading.

However with user defined types, that flexibility was not always desirable as sometime it gave wrong code for code that look right (as my simplified f function above).

Thus the rules were defined more precisely to fix such undesirable behavior. Newer rules also apply to operators on predefined types like int.

As one should never write code that depend on undefined behavior, any valid program written before C++11, is also valid under stricter rule of C++ 11. Same thing for C++ 17.

On the other hand, if your program is written using C++17 rules, it might have undefined behavior if you compiler with an older compiler.

Generally, one would start to write using newer rule at a point he does not have to return to an older compiler.

Obvioulsy, if one write libraries for other, then he need to ensure that his code does not have undefined behavior for supported C++ standard.

Phil1970
  • 2,605
  • 2
  • 14
  • 15