10

If I do:

std::map<string, size_t> word_count;
size_t value = word_count.count("a") == 0 ? 1 : 2;
word_count["a"] = value;

then the final value of word_count["a"] is 1, as I would expect. If instead I do:

std::map<string, size_t> word_count;
word_count["a"] = word_count.count("a") == 0 ? 1 : 2;

then the final value of word_count["a"] is 2. Why!?

jesper
  • 385
  • 1
  • 4
  • 15
  • Isn't `word_count["a"]++` shorter/easier? – mfontanini Apr 06 '13 at 21:11
  • @mfontanini, Yes, but that should have a `min()` if used repeatedly to limit it to 2. – chris Apr 06 '13 at 21:13
  • Wow, I'm really sorry that you've probably read all of those comments by now in the hopes of a completely straight answer. It must be more torturous for you than it is for us. – chris Apr 06 '13 at 22:56
  • @chris: :-) It does makes sense to me now. My mistake was assuming that the right side of the assignment would be evaluated before the left side. – jesper Apr 07 '13 at 11:05

2 Answers2

11

Officially, either side of the assignment could be evaluated first. It's up to the implementation to decide which. If word_count does not contain an "a", one is inserted, and an lvalue reference to it returned. If word_count does contain one, only the latter part happens. Despite the uncertainty of which side is evaluated first, you can follow possible executions:

Left Side First

No Element Exists:

operator[] inserts the element since it isn't already there. count() finds it and returns 1, so you end up with it being assigned a value of 2.

Element Exists:

operator[] returns the existing element and count() finds it and returns 1, so you end up with it being assigned a value of 2.

Right Side First

No Element Exists:

count() returns 0, so you get 1 from the right side. Then, "a" is inserted into the map and assigned a value of 1.

Element Exists:

count() returns 1, so you get 2 from the right side. Then, word_count["a"] is accessed and has 2 assigned to it.

Conclusion

In short, you can't rely on this to do what you want, so it's better to use something that you can rely on. mrfontanini made a good suggestion, so I'll edit it a bit:

word_count["a"]++;
word_count["a"] = std::min(word_count["a"], 2);

The first line ensures it is inserted and has a value of at least 1. The second limits that value to a maximum 2, in the case that you do this operation repeatedly.

Notes

I base this answer off of two things:

  1. When a side is picked to be evaluated, the whole side has to be evaluated before the other side can start.

  2. Constructs such as word_count["a"] = 1 exhibit well-defined behaviour, even in the case that an element is inserted and then assigned to.

There is some debate and discussion below about whether these are true or not. I've made it a bit more official now.

Community
  • 1
  • 1
chris
  • 60,560
  • 13
  • 143
  • 205
  • And additionally `word_count["a"]` is being evaluated before `word_count.count("a")` which is possibly what is surprising the OP. – john Apr 06 '13 at 21:10
  • 4
    It need not be evaluated first. This is, I believe, a case of undefined behavior ([reading a value and modifying it without a sequence point](http://stackoverflow.com/questions/4176328/undefined-behavior-and-sequence-points)), though I've never seen it invoked indirectly like this. – Benjamin Lindley Apr 06 '13 at 21:14
  • @BenjaminLindley, Really? I was pretty sure it was evaluated first, but that could be coming from examples that don't use the map on the right side. – chris Apr 06 '13 at 21:15
  • @BenjaminLindley, Looking at it again, I believe you're right. It could evaluate either side of `operator=` first, and which it chooses affects the outcome. – chris Apr 06 '13 at 21:17
  • @chris It's implementation defined whether left or right hand side of an assignment is evaluated first. But as Benjamin says this is technically UB because of the modification, although in practice I think the explanation is that this compiler is evaluating the LHS first in this case. – john Apr 06 '13 at 21:18
  • 3
    It's really no different (only superficially) than this: `(x = 0) = (x == 0) ? 1 : 2;` – Benjamin Lindley Apr 06 '13 at 21:18
  • Okay, I lightly patched up the post. I'm going to change it to include that. – chris Apr 06 '13 at 21:19
  • @stardust_: Sure that's fine. The issue is that we're writing to `word_count["a"]` twice without an interleaving sequence point. – Bill Lynch Apr 06 '13 at 21:22
  • @sharth, I don't think so, are we? `count()` is marked `const`, so it couldn't. It's rather like `i = i++;` vs `i = i + 1;`. – chris Apr 06 '13 at 21:23
  • @chris: I'm actually less sure about that not being UB as well, but you're missing that the lhs of the `=`, that is just `word_count["a"]` also writes to the map in the case that the key "a" does not exist. – Bill Lynch Apr 06 '13 at 21:25
  • @sharth, Oh, duh. That makes two writes :p – chris Apr 06 '13 at 21:27
  • @sharth, Wait, how would `word_count["a"] = 1;` or something work then if it doesn't exist? It's annoying trying to get around all of these while rewriting my answer :p – chris Apr 06 '13 at 21:32
  • @BenjaminLindley, More like `(x = 0, x) = (x == 0) ? 1 : 2;`, is it not? It has to insert the value and return an lvalue reference to it. That would make more sense to me, as there's no undefined behaviour from the assignment, which would make `word_count["a"] = 1;` work. – chris Apr 06 '13 at 21:42
  • Oops, I misspoke. The statement is not equivalent to `(x = 0) = (x == 0) ? 1 : 2;`, it is rather equivalent to this `(x = 0) = (y == "a") ? 1 : 2;` -- But it's still undefined behavior, because, while there is a sequence point within the ternary operator itself, the assignment `(x = 0)` is not sequenced relative to the ternary expression. So, while it's not reading the value of x, it is modifying it twice, which is also UB. – Benjamin Lindley Apr 06 '13 at 21:44
  • @chris: Even if it's `(x = 0, x)`, that's still UB, because even though, in that case, `x = 0` is sequenced relative to `x`, the whole expression `(x = 0, x)` is not sequenced relative to the ternary expression. So again, that's modifying x twice, unsequenced. – Benjamin Lindley Apr 06 '13 at 21:48
  • @BenjaminLindley, But if the left side is first, you get a sequenced assignment to `x`, and it's returned for later assignment. If the right side is evaluated first, it reads the current value, then you get a sequenced assignment from the left side, then both are again used in the assignment. Wouldn't that work? I don't see how there can ever be two modifications unsequenced. And I'm pretty sure `word_count["a"] = 1;` has to ultimately rely on behaviour like that. In that case, it just doesn't matter which side is evaluated first. – chris Apr 06 '13 at 21:51
  • @sharth, Can they? I thought it had to stick to the one it chose and evaluate the entire subtree. It doesn't make sense to me how it could randomly go up and down the other side. – chris Apr 06 '13 at 21:53
  • @chris: I'm fairly sure they can be, but I'd love for that statement to be proved wrong. It's making me wonder how `map[1] + map[1]` doesn't fail. – Bill Lynch Apr 06 '13 at 21:54
  • @sharth, Well, I'll finish what I'm doing with my answer. I'd really like for these fine points to be proven. I think it might constitute a new, really specific question. – chris Apr 06 '13 at 21:56
  • @chris: That would seem to make sense, but it is as sharth said. For example, a pointer to x may be loaded in register A for the modification to 0, then another pointer to x may be loaded to register B in order to make the assignment to the result of the ternary expression. Then, since they are unsequenced, either the assignment to 0 can happen first, or the assignment from the ternary expression can happen. It may sound random, but the compiler writers may have a good reason to do it that way. I'm not versed enough in the low level machine stuff to be sure. – Benjamin Lindley Apr 06 '13 at 21:58
  • @BenjaminLindley, Well, I clearly outlined that I was relying on these two assumptions, so that's the best I have for now. I would really like for the two points to be studied with fine detail and get some other people involved. Does one (or two) questions for the two points sound like a good idea to you? – chris Apr 06 '13 at 22:08
  • I really think that [this question](http://stackoverflow.com/questions/4176328/undefined-behavior-and-sequence-points) covers it. However, I won't vote to close as duplicate if you think it needs more clarification. – Benjamin Lindley Apr 06 '13 at 22:19
  • @BenjaminLindley, Doesn't this example from it seem to go in my favour? `i = (++i,i++,i) // well defined ` – chris Apr 06 '13 at 22:26
  • Maybe, I don't know anymore. My head is near exploding, as happens every time I try to think about this stuff. I need to stop getting in to these discussions about the finer points of C++, because I'm pretty sure that most of the time, I'm wrong. – Benjamin Lindley Apr 06 '13 at 22:31
  • @BenjaminLindley, I know that feeling. It's been happening to me for most of this long section of comments :( I think I might ask a new question after I eat. – chris Apr 06 '13 at 22:32
  • @BenjaminLindley, I ended up asking. It's [here](http://stackoverflow.com/questions/15865627/is-indexing-a-new-map-element-and-having-something-that-reads-it-assigned-to-it#15866238) if you want to see it. – chris Apr 07 '13 at 18:41
5

Looking at this line:

word_count["a"] = word_count.count("a") == 0 ? 1 : 2;

I believe that if word_count["a"] does not exist in the map before that line is executed, you cause undefined behavior.

This is because word_count["a"] will create an entry in the map if one doesn't exist, and this will change the behavior of word_count.count("a"). We also have no required sequencing between those two calls..

Bill Lynch
  • 80,138
  • 16
  • 128
  • 173