21

Since version 1.80, Cppcheck tells me that

Expression 'msg[ipos++]=checksum(&msg[1],ipos-1)' depends on order of evaluation of side effects

in this code sequence (simplified, data is a variable)

BYTE msg[MAX_MSG_SIZE];  // msg can be smaller, depending on data encoded
int ipos = 0;
msg[ipos++] = MSG_START;
ipos += encode(&msg[ipos], data);
msg[ipos++] = checksum(&msg[1], ipos-1);  // <---- Undefined Behaviour?
msg[ipos++] = MSG_END;   // increment ipos to the actual size of msg

and treats this as an error, not a portability issue.

It's C code (incorporated into a C++-dominated project), compiled with a C++98-complient compiler, and meanwhile runs as expected for decades. Cppcheck is run with C++03, C89, auto-detect language.

I confess that the code should better be rewritten. But before doing this, I try to figure out: Is it really dependent on evaluation order? As I understand it, the right operand is being evaluated first (it needs to before the call), then the assignment is taking place (to msg[ipos]) with the increment of ipos done last.

Am I wrong with this assumption, or is it just a false positive?

cpplearner
  • 13,776
  • 2
  • 47
  • 72
Wolf
  • 9,679
  • 7
  • 62
  • 108
  • 3
    [Help](http://en.cppreference.com/w/c/language/eval_order). *"There is no concept of left-to-right or right-to-left evaluation in C"* – BiagioF Aug 29 '17 at 11:27
  • Can't you just write `msg[ipos] = checksum(&msg[1], ipos++)`? – Erik W Aug 29 '17 at 11:27
  • http://en.cppreference.com/w/c/language/eval_order – Ôrel Aug 29 '17 at 11:30
  • @ErikW rewriting comes next, there are more than this one occurrence of the pattern, and the `ipos++` patter on the left tells that this is sequential writing into a buffer. – Wolf Aug 29 '17 at 11:31
  • @BiagioFesta, true :-) but isn't the call forcing ipos-1 being evaluated first? – Wolf Aug 29 '17 at 11:35
  • 7
    @ErikW That suffers from the same ordering problem. – molbdnilo Aug 29 '17 at 11:51
  • 3
    "meanwhile runs as expected for decades" – no offense, but this does sound a bit like "works for me". Unspecified and even undefined behavior is very different from e.g. a coin flip. One compiler may always do what you want, another always what you don't want, and for a third, the behavior may indeed be sporadic. – Arne Vogel Aug 29 '17 at 13:47
  • 3
    @ArneVogel it works (not only "for me") as long as the code base is fixed to the current platform/toolchain. Be reassured: the code *will* change to make porting possible. Posting it here was worth the lesson it can teach. – Wolf Aug 29 '17 at 14:00

2 Answers2

41

This code does indeed depend on evaluation order in a way which is not well defined:

msg[ipos++] = checksum(&msg[1], ipos-1);

Specifically, it is not specified whether ipos++ will increment before or after ipos-1 is evaluated. This is because there is no "sequence point" at the =, only at the end of the full expression (the ;).

The function call is a sequence point. But that only guarantees that ipos-1 happens before the function call. It does not guarantee that ipos++ happens after.

It appears the code should be rewritten this way:

msg[ipos] = checksum(&msg[1], ipos-1);
ipos++; // or ++ipos
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • 6
    @Wolf: Yes, the function call is a sequence point. But that only guarantees that `ipos-1` happens before the function call. It does not guarantee that `ipos++` happens after. – John Zwinck Aug 29 '17 at 11:34
  • I'd like to accept your eye-opening answer, but could you also incorporate the detail about the undefined order of call and calculation of the assignment target. My fault was to assume that having `ipos-1` before the call and assignment after the call would be enough to be safe here... – Wolf Aug 29 '17 at 11:41
  • 2
    @Wolf the compiler is free to calculate the memory address at which the assignment will be written (and thus update `ipos`) before evaluating the RHS. – Alnitak Aug 29 '17 at 11:48
  • Is it really only unspecified or actually undefined? Beware that it's pre-C++11. – Deduplicator Aug 29 '17 at 11:51
  • 2
    What the newer standards say is irrelevant, since the question is explicitly about C89, C++98 and C++03. – Lundin Aug 29 '17 at 11:59
  • @Wolf: I have added the text from my comment to the answer, as you requested. – John Zwinck Aug 29 '17 at 12:24
  • @Thanks for adding that detail :-) – Wolf Aug 29 '17 at 13:11
8

The order of evaluation of the operands of = is unspecified. So to begin with, the code relies on unspecified behavior.

What makes the case even worse is that ipos is used twice in the same expression with no sequence point in between, for unrelated purposes - which leads to undefined behavior.

C99 6.5

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

The very same text applies to C90, C99, C++98 and C++03. In C11 and C++11 the wording has changed but the meaning is the same. It is undefined behavior, pre-C++11.

The compiler is not required to give a diagnostic for undefined behavior. You were lucky that it did. It is not a false positive - your code contains a severe bug, all the way from the original C code.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • Regarding the assignment operator specifically, C99 says "The side effect of updating the stored value of the left operand shall occur between the previous and the next sequence point.". But that's irrelevant, because the UB occurs during the value calculation itself. – Lundin Aug 29 '17 at 11:57
  • 2
    For what it's worth, it wasn't the compiler that gave the warning--it was a static analysis tool (cppcheck, which is free and not extremely elaborate, but useful). – John Zwinck Aug 29 '17 at 12:23
  • 1
    @JohnZwinck gcc/g++ with enough -Wall and -Wextra flags also tend to warn for this UB. (Specifically -Wsequence-point.) – Lundin Aug 29 '17 at 12:49
  • *`for unrelated purposes`* -- the usage is related. The value of `data` controls how many bytes are written and encode `returns` this amount (and it's stored in `ipos`). Maybe I should add this to the question. – Wolf Aug 29 '17 at 13:16
  • 1
    @Wolf No, it is not obviously related. Neither the reader or the compiler can tell what that function does. You are calling a function which does `` with the variable as parameter, in the same expression as you increase that variable. You can't know if the variable will get increased before or after the function call. The variable is not "read to determine the value to be stored". This refers to cases like `i = i + 1;` which is well-defined. – Lundin Aug 29 '17 at 13:24
  • @Lundin I see your point. The relation is most probably *the originally intended one*, a developer will read this from the context (as I did today: it's not "my code"), of course a compiler cannot deduce it (I did not claim that the relation was *`obvious`*). – Wolf Aug 29 '17 at 13:31
  • @Lundin What is the change with C++14 you are referring to? Do new rules apply to evaluation order or are compilers required to raise a compiler error? – Wolf Aug 29 '17 at 14:20
  • 1
    @Wolf It was changed in C++11 even (but not in C11). I mixed up the new C++ standards. Answer corrected. See this: http://en.cppreference.com/w/cpp/language/eval_order, scroll down to "undefined behavior". – Lundin Aug 29 '17 at 14:27