Reading from Iterator yields new value before it gets incremented in next argument

Question

I'm trying to sum up the distances of a closed path in some graph. The path, stored as a vector of ints representing the nodes, begins with 0 and is supposed to have another edge back from the last node to 0. So I'm doing

using namespace std;
typedef vector<int> Route;

// r is some Route, g is some Graph
int distance = 0;
for (Route::iterator it = r.begin(); it != r.end(); ) {
    clog << "it: " << *it << endl;
    distance += g.dist(*it, (++it == r.end()) ? 0 : *it);
}

However, the first argument of my dist method is receiving the wrong value. The logging in the line above does output the correct value, but logging the arguments from dist shows that it does get passed the same value twice.

In the last iteration, it does get passed 0 which is plain wrong. Sometimes, with bigger vectors (I've got another input data with 120 waypoints), the first argument becomes some abstruse large value (which breaks the distance function) while the second argument is 0 (as supposed).

What does happen here and why does it happen? I have solved this now with a temporary variable for the first argument, but it seems ugly and unnecessary.

For completeness, here's the code of my dist method which is part of a Graph class (basically an adjacency matrix):

int dist(size_t x, size_t y) {
    clog<<"distance: "<<x<<" "<<y<<endl;
    if (x == y) return 0;
    if (x < y) return dist(y, x); // edges are symmetric
    if (x > size)
        throw out_of_range("Graph::dist: index too large");
    return triangle[x*(x-1)/2+y]; // stored in a flat array representing the lower half of the matrix
}

Here's the output for a small example (vector [0, 1, 2, 3]):

it: 0
distance: 1 1
it: 1
distance: 2 2
it: 2
distance: 3 3
it: 3
distance: 0 0
          ^^^ the distance arguments are wrong

and for the large example I'm getting

[…]
distance: 32 32
it: 32
distance: 51 51
it: 51
distance: 90 90
it: 90
distance: 12 12
it: 12
distance: 859381811 0
          ^^^^^^^^^ WTH?
terminate called after throwing an instance of 'std::out_of_range'
  what():  Graph::dist: index too large

I had expected to get (for the small example):

it: 0
distance: 0 1
it: 1
distance: 1 2
it: 2
distance: 2 3
it: 3
distance: 3 0

It could be related to the fact that the order of evaluation of function arguments is unspecified. — juanchopanza, Dec 15 '13 at 00:25
@juanchopanza: Is it? Uh, I hadn't expected this from C++ (I've never came across such a language, I only have used higher-level, better-specified ones). What can I do against it, do I really need that temporary variable? — Bergi, Dec 15 '13 at 00:28
It is worse than that. There is no sequence point between the expressions that are the arguments to the function, so what you really have is undefined behaviour. Yes, you need a temporary variable. — juanchopanza, Dec 15 '13 at 00:31
How could the use of a temporary variable possibly be uglier than that convoluted expression? — Benjamin Lindley, Dec 15 '13 at 00:32
@Bergi : "better-specified" is in the eye of the beholder. The fact that C++ does not require function arguments to be evaluated in any particular order might be seen as a feature by some. Especially those writing optimizing compilers, and their users that want those compilers to produce the fastest possible code. — Joe Z, Dec 15 '13 at 00:40
@JoeZ: I would not want an optimizer to only optimize function argument order evaluation. It would however make more readable and concise code if the order was specified. — Bergi, Dec 15 '13 at 01:32
@bergi you should give an example of more readable code if order was specified, because the above does not qualify. Write only code like the above is short, but far from readable. — Yakk - Adam Nevraumont, Dec 15 '13 at 01:44
@Bergi: I don't buy the "more readable" argument. I suppose it would make C++ less _surprising_ to the uninitiated, but making the order well defined invites shenanigans that hurt readability. — Joe Z, Dec 15 '13 at 02:02
@JoeZ: Yeah; "less surprising", "more unambiguous", that's the wording I was looking for. As a non-native speaker I still have trouble with the vocabs :-/ Indeed, my example was only short, but hardly readable - I should have omitted the handling of the special-last-element-case. Still, undefined order does make it hard to write concise statements (or I have to practice more with the language) — Bergi, Dec 15 '13 at 03:15
`I've never came across such a language` -- another such language is C. — Cubbi, Dec 17 '13 at 12:02

score 2 · Accepted Answer · edited May 23 '17 at 12:28

The compiler made an optimization when you are using *it in the function call. The following works:

#include <iostream>
#include <vector>
using namespace std;

int dist(size_t x, size_t y) {
    cout<<"distance: "<<x<<" "<<y<<endl;
    return y - x;
}

int main() {
  // r is some Route, g is some Graph
  vector<int> r = {1, 2, 3, 4};
  int distance = 0;
  for (auto it = r.begin(); it != r.end(); ) {
      cout << "it: " << *it << endl;
      int prev = *it;
      ++it;    
      int current = (it == r.end()) ? 0 : *it;
      distance += dist(prev, current);
  }
}

I am still digging the standard and trying to figure out why that optimization is allowed. Evaluation order doesn't really explain it.

========== Edit ===========================

I was wrong. Evaluation order is the cause of the problem. In this case, your ternary operator was evaluated in the correct order, but gcc is also free to move around the subexpression of the ternary operator. The assembly of your code looks like the following:

  movq  %rax, %rdi  # tmp96,
  call  _ZN9__gnu_cxx17__normal_iteratorIPiSt6vectorIiSaIiEEEppEv #
  leaq  -32(%rbp), %rdx #, tmp97
  movq  %rdx, %rsi  # tmp97,
  movq  %rax, %rdi  # D.33388,
  call  _ZN9__gnu_cxxeqIPiSt6vectorIiSaIiEEEEbRKNS_17__normal_iteratorIT_T0_EESA_ #

      # If the test succeeds, go to fetch the content of the iterator
      # otherwise just set it to zero and proceed.
  testb %al, %al  # D.33389
  je  .L5 #,
  movl  $0, %ebx  #, iftmp.2
  jmp .L6 #
.L5:
  leaq  -64(%rbp), %rax #, tmp98
  movq  %rax, %rdi  # tmp98,

      # fetch the content of the iterator if we need it
  call  _ZNK9__gnu_cxx17__normal_iteratorIPiSt6vectorIiSaIiEEEdeEv  #
  movl  (%rax), %eax  # *D.33393_16, D.33394
  movslq  %eax, %rbx  # D.33394, iftmp.2
.L6:
  leaq  -64(%rbp), %rax #, tmp99
  movq  %rax, %rdi  # tmp99,

      # deference the iterator again. NOTE iterator here was NOT INTCREMENTED
  call  _ZNK9__gnu_cxx17__normal_iteratorIPiSt6vectorIiSaIiEEEdeEv  #
  movl  (%rax), %eax  # *D.33395_19, D.33396
  cltq
  movq  %rbx, %rsi  # iftmp.2,
  movq  %rax, %rdi  # D.33397,
  call  _Z4distmm #
  addl  %eax, -24(%rbp) # D.33398, distance

Pay attention to the evaluation of *it. In this case, the iterator was incremented if necessary, the ternary operator condition is evaluated, and then it is dereferenced.

============ Edit ===========

The order of evaluation in C++ for function parameters and its subexpressions are undefined. More information can be found on function parameter evaluation order. There is also a really nice artcle on the details of evaluation order written on cppreference

I'd be happy to accept this answer if you could point to the section in the standard that states where evaluation order is undefined… Digging into the assembly was more than I requested, since I already knew that it is producing ~~wrong~~ unintended code :-) — Bergi, Dec 15 '13 at 03:23
There is an SO answer that explains the topic in detail, so I edited my answer and linkes to it. Digging through assembly was completely for my own sake XD. I did not believe that subexpression can be optimized out this way, so I had to see it with my own eyes :O — leorex, Dec 15 '13 at 04:05

Reading from Iterator yields new value before it gets incremented in next argument

1 Answers1