50

The following test program

#include <map>
#include <iostream>

using namespace std;

int main(int argc, char **argv)
{
    map<int,int> a;
    a[1]=a.size();
    for(map<int,int>::const_iterator it=a.begin(); it!=a.end(); ++it)
            cout << "first " << (*it).first << " second " << (*it).second << endl;
}

leads to different output when compiled on g++ 4.8.1 (Ubuntu 12.04 LTS):

g++ xxx.cpp 
./a.out 
first 1 second 1

and on Visual Studio 2012 (Windows 7) (Standard Win32 Console Application Project):

ConsoleApplication1.exe
first 1 second 0

Which compiler is right? Am I doing something wrong?

pmr
  • 58,701
  • 10
  • 113
  • 156
Wald
  • 371
  • 3
  • 4

3 Answers3

77

This is actually a well-formed program that has two equally valid execution paths, so both compilers are right.

a[1] = a.size()

In this expression, the evaluation of the two operands of = are unsequenced.

§1.9/15 [intro.execution] Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced.

However, function calls are not interleaved, so the calls to operator[] and size are actually indeterminately sequenced, rather than unsequenced.

§1.9/15 [intro.execution] Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function.

This means that the function calls may happen in one of two orders:

  1. operator[] then size
  2. size then operator[]

If a key doesn't exist and you call operator[] with that key, it will be added to the map, thereby changing the size of the map. So in the first case, the key will be added, the size will be retrieved (which is 1 now), and 1 will be assigned to that key. In the second case, the size will be retrieved (which is 0), the key will be added, and 0 will be assigned to that key.

Note, this is not a situation that brings about undefined behaviour. Undefined behaviour occurs when two modifications or a modification and a read of the same scalar object are unsequenced.

§1.9/15 [intro.execution] If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

In this situation, they are not unsequenced but indeterminately sequenced.

So what we do have is two equally valid orderings of the execution of the program. Either could happen and both give valid output. This is unspecified behaviour.

§1.3.25 [defns.unspecified]
unspecified behavior
behavior, for a well-formed program construct and correct data, that depends on the implementation


So to answer your questions:

Which compiler is right?

Both of them are.

Am I doing something wrong?

Probably. It's unlikely that you would want to write code that has two execution paths like this. Unspecified behaviour can be okay, unlike undefined behaviour, because it can be resolved to a single observable output, but it's not worth having in the first place if you can avoid it. Instead, don't write code that has this kind of ambiguity. Depending on what exactly you want correct path to be, you can do either of the following:

auto size = a.size();
a[1] = size; // value is 0

Or:

a[1];
a[1] = a.size(); // value is 1

If you want the result to be 1 and you know the key doesn't yet exist, you could of course do the first code but assign size + 1.

Joseph Mansfield
  • 108,238
  • 20
  • 242
  • 324
  • 14
    +1 You were the only one to mention the effect of having function calls here, which makes all the difference in making this unspecified rather than undefined behaviour. – juanchopanza Jan 15 '14 at 10:32
15

In this case, where a[1] returns a primitive type, please refer to this answer. In the case in which the std::map's value type is an user defined type and operator=(T, std::size_t) is defined for that type, the expression:

a[1] = a.size();

can be converted to the corresponding less-syntactic-sugar version:

a[1] = a.size();
a.operator[](1) = a.size();
operator=(a.operator[](1), a.size());

And, as we all know from the §8.3.6/9:

The order of evaluation of function arguments is unspecified.

which leads to the fact that the result of the above expression is unspecified.

We have, of course, two cases:

  • If the a.operator[](1) is evaluated first, the size of the map is incremented by 1 leading to the first output (first 1 second 1).
  • If the a.size() is evaluated first, the output you'll get is the second one (first 1 second 0).
Community
  • 1
  • 1
Shoe
  • 74,840
  • 36
  • 166
  • 272
  • Note that the type involved are primitive here though, so there is no implicit `operator=` function call here. – Oliver Charlesworth Jan 15 '14 at 10:03
  • 1
    This is pretty close, but it's not about the order of evaluation of function arguments (since the use of `=` here is not a function call). Instead, it's about the fact that the function calls are indeterminately sequenced because they cannot be interleaved. – Joseph Mansfield Jan 15 '14 at 10:37
  • @sftrabbit, of course you deserve the accepted answer. I'm just gonna leave this here as a reference for the case where the value type of the `std::map` is not a primitive type. If you wish to correct something or get this post deleted, please let me know. – Shoe Jan 15 '14 at 10:48
14

This is known as a sequence-point issue which means certain operations may be performed in any order chosen by the compiler.

If one has side-effects on the other, it is called "unspecified behaviour" a bit like "undefined behaviour" however where the result must be one of a fixed subset of outcomes, so here it must be either 0 or 1 and can't be any other value. In reality you should usually avoid doing it.

In your particular case. performing operator [] on a map changes its size (if that element does not yet exist). Thus it has a side effect on the right hand side of what it is assigning to it.

Community
  • 1
  • 1
CashCow
  • 30,981
  • 5
  • 61
  • 92
  • There are well-defined sequence points here, two function calls. – Oliver Charlesworth Jan 15 '14 at 09:42
  • But the left side and the right side of the assignment have undetermined sequence relative to each other. – Benjamin Lindley Jan 15 '14 at 09:44
  • 7
    When I get tempted to go back to C++ development, I run into something like this and back away slowly again. I appreciate giving the compiler freedom to optimise, but this one's a real bottom-biter. – Matthew Walton Jan 15 '14 at 09:45
  • 4
    @MatthewWalton Would you really do something like `a[1]=a.size();` in another language (genuine question, I avoid it but it might be my C++ bias)? – juanchopanza Jan 15 '14 at 09:48
  • @OliCharlesworth a[1] is one function call, a.size() is another and the = is not a function call but just assigning a reference. The compiler can evaluate either side of it first. – CashCow Jan 15 '14 at 09:52
  • 7
    It's not undefined. It's just genuinely unspecified, which means that the standard basically says "all possible execution sequences are fine, the result has to be one of those coming from these sequences". But it places no restrictions on which one - in fact, it would be valid if it changed from execution to execution. – Sebastian Redl Jan 15 '14 at 09:54
  • Thanks for clarification. Didn't expect it though, didn't look like a issue to me at first glance. Thought rvalue would be evaluated first. - edit: On second thought... [] on a map inserts a new value regardless of the assignment... kinda overlooked that... – Wald Jan 15 '14 at 09:55
  • Ok I have changed my answer to note "unspecified" behaviour. – CashCow Jan 15 '14 at 10:04
  • Arguably that's made it worse though :( "Unspecified" is not considered a form of undefined behaviour; they're distinct. (Although in practice, they're typically equally annoying...) – Oliver Charlesworth Jan 15 '14 at 10:05
  • 2
    @juanchopanza Never. `a[1]=a.size()` is saying "use a postcondition to this very operation right now before it's valid" – Paul Evans Jan 15 '14 at 10:07
  • 2
    @juanchopanza: Certainly. For example, in Python `a[1] = a.size()` roughly translates to `a.__setitem__(1, a.size())`, which has perfectly fine semantics. But that's because Python doesn't allow you to return an assignable reference the way C++ does, so `a.x() = a.y()` is not a valid construct. Indexed assignment (`a[1] = ...`) is special by protocol, as is attribute assignment (`a.foo = ...`) (cf. [`__setitem__`](http://docs.python.org/2/reference/datamodel.html#object.__setitem__) and [`__setattr__`](http://docs.python.org/2/reference/datamodel.html#object.__setattr__)). – Florian Brucker Jan 15 '14 at 12:56
  • @juanchopanza no, I wouldn't usually do such a thing in any language, but what bothers me are the more subtle things which come up, caused by the same thing, in more complicated expressions which then might take a very, very long time to track down. – Matthew Walton Jan 16 '14 at 09:06
  • @PaulEvans Or, of course "use the precondition to this very operation right after it's invalidated". Depending on the implementation :) – sehe Jan 16 '14 at 22:37
  • @sehe Absolutely, 'tis mixing an' matching things that simply don't belong together! Doesn't matter which bad choice the implementation decides on. Hence the _unspecified behaviour_ in c++ or any other environment you pollute this abomination with ;) – Paul Evans Jan 18 '14 at 21:16