14

I guess the answer is "no", but from a compiler point of view, I don't understand why.

I made a very simple code which freaks out compiler diagnostics quite badly (both clang and gcc), but I would like to have confirmation that the code is not ill formatted before I report mis-diagnostics. I should point out that these are not compiler bugs, the output is correct in all cases, but I have doubts about the warnings.

Consider the following code:

#include <iostream>

int main(){
  int b,a;
  b = 3;
  b == 3 ? a = 1 : b = 2;
  b == 2 ? a = 2 : b = 1;
  a = a;
  std::cerr << a << std::endl;
}

The assignment of a is a tautology, meaning that a will be initialized after the two ternary statements, regardless of b. GCC is perfectly happy with this code. Clang is slighly more clever and spot something silly (warning: explicitly assigning a variable of type 'int' to itself [-Wself-assign]), but no big deal.

Now the same thing (semantically at least), but shorter syntax:

#include <iostream>

int main(){
  int b,a = (b=3, 
             b == 3 ? a = 1 : b = 2, 
             b == 2 ? a = 2 : b = 1, 
             a);
  std::cerr << a << std::endl;
}

Now the compilers give me completely different warnings. Clang doesn't report anything strange anymore (which is probably correct because of the parenthesis precedence). gcc is a bit more scary and says:

test.cpp: In function ‘int main()’:
test.cpp:7:15: warning: operation on ‘a’ may be undefined [-Wsequence-point]

But is that true? That sequence-point warning gives me a hint that coma separated statements are not handled in the same way in practice, but I don't know if they should or not.

And it gets weirder, changing the code to:

#include <iostream>

int main(){
  int b,a = (b=3, 
             b == 3 ? a = 1 : b = 2, 
             b == 2 ? a = 2 : b = 1, 
             a+0); // <- i just changed this line
  std::cerr << a << std::endl;
}

and then suddenly clang realized that there might be something fishy with a:

test.cpp:7:14: warning: variable 'a' is uninitialized when used within its own initialization [-Wuninitialized]
             a+0);
             ^

But there was no problem with a before... For some reasons clang cannot spot the tautology in this case. Again, it might simply be because those are not full statements anymore.

The problems are:

  • is this code valid and well defined (in all versions)?
  • how is the list of comma separated statements handled? Should it be different from the first version of the code with explicit statements?
  • is GCC right to report undefined behavior and sequence point issues? (in this case clang is missing some important diagnostics) I am aware that it says may, but still...
  • is clang right to report that a might be uninitialized in the last case? (then it should have the same diagnostic for the previous case)

Edit and comments:

  • I am getting several (rightful) comments that this code is anything but simple. This is true, but the point is that the compilers mis-diagnose when they encounter comma-separated statements in initializers. This is a bad thing. I made my code more complete to avoid the "have you tried this syntax..." comments. A much more realistic and human readable version of the problem could be written, which would exhibit wrong diagnostics, but I think this version shows more information and is more complete.
  • in a compiler-torture test suite, this would be considered very understandable and readable, they do much much worse :) We need code like that to test and assess compilers. This would not look pretty in production code, but that is not the point here.
Thibaut
  • 2,400
  • 1
  • 16
  • 28
  • If you write code that's intended to confuse humans, it should come as no surprise that the compiler might not be able to handle it very well. That's what these warnings are *for*, right? – Carl Norum May 28 '13 at 17:56
  • Looks like GCC might be a bit too cautious. Try the latest version. In any case, `int a, b = X;` is the same as `int a; int b = X;`, if that simplifies matters a bit. – Kerrek SB May 28 '13 at 17:59
  • 3
    Your _Consider the very simple code_ code is ghastly! I'd send anyone who tried to present that for a code review back to school for remedial programming lessons on clarity of expression. However, there is a full sequence point at each comma in a comma expression, so the LHS of a comma expression is fully evaluated (with side-effects completed) before the RHS is evaluated. So, I think there probably isn't formally a problem, but the code should not be used. Setting both `a` and `b`, potentially multiple times, is ghastly. Basically, **don't do it!** – Jonathan Leffler May 28 '13 at 17:59
  • 1
    Well, yes, agreed that this code is confusing, but that is the extreme case (ie i tripped out everything to make the problem stand out). A more realistic case would be using simple coma-separated statements in initializers and you might get too much (or not enough) warnings, which is potentially a bad thing. – Thibaut May 28 '13 at 18:00
  • @KerrekSB: tried with gcc 4.7, 4.8 and 4.9 (trunk) and clang 3.2 and trunk. – Thibaut May 28 '13 at 18:01
  • Oh well... it is a *warning*, after all, and you're free to disregard those. I suppose GCC doesn't fold constant expressions all the way when checking those initialization conditions... – Kerrek SB May 28 '13 at 18:03
  • 5
    @JonathanLeffler and Carl, I fail to see how saying he should not do this helps to answer the question. I am eager to learn more about how the compiler even it we must write code that we will not use in production. – Sqeaky May 28 '13 at 18:04
  • @Sqeaky: The _However_ sentence in my comment explains my current thoughts on what the correct behaviour should be — I think that Clang is correct in not warning and GCC is probably wrong in generating the warning, but I'm not sure enough to make it an answer, which is why it is a comment. And I still reject the concept that anyone should have to understand code like that...it is appalling code and should not need to be discussed (unless you're a compiler writer trying to work out what the heck is appropriate for the code). – Jonathan Leffler May 28 '13 at 18:09
  • @KerrekSB for a vast majority of users, you are right. But as a compiler engineer, bad warnings bug me, because they should be here to help and shouldn't be misleading. – Thibaut May 28 '13 at 18:10
  • I would hypothesise some sort of optimisation is involved. If gcc and clang optimise away most of the first example by observing a conditional depending on a constant value, then a is known to be defined by that point in execution. With commas on the other hand, if gcc now no longer recognises the constant conditionals, it no longer knows that a will be defined, and so emits a warning. Just a random guess though, I'm far from an expert on the inner workings of compilers so I could be very wrong. – Sysyphus May 28 '13 at 18:13
  • @JonathanLeffler this is not a problem semantically. I do not argue that this syntax is breaking anything. The issue here is that compilers should do the same dependency analysis in a sequence of statements, whether they are comma separated or full statement, and both compilers are failing on this point. – Thibaut May 28 '13 at 18:14
  • @Sysyphus : i guess this sort of warning comes from parsing and analysing the AST. Branch optimization and const propagation kick in later. `a` is most likely evaluated at compile time, even at O0. At least that is my assumption, i might be wrong. – Thibaut May 28 '13 at 18:17
  • For the `int a = (a = ...)`, http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1343 is relevant. If the initialization of the non-class object is an own full-expression, it will be well-formed. If it is part of the same full-expression as its initializer, it's probably undefined (I'm not aware of a rule that would sequence them). – Johannes Schaub - litb May 28 '13 at 18:32
  • yuck. I know now why I never want to write a compiler... – Walter May 28 '13 at 18:32
  • Also, http://stackoverflow.com/questions/5760866/when-exactly-is-an-initializer-temporary-destroyed – Johannes Schaub - litb May 28 '13 at 18:37
  • @JohannesSchaub-litb interesting, very well spotted. Does that imply that the comma separated statements are not considered to be full statements because they are inside the initializer? – Thibaut May 28 '13 at 18:37
  • `int a; a = a = 1;` generates the same g++ warning, and possibly with reason (two side effects on the same variable, even though they happen to do the same thing.) Also, 8.5p2: an initializer in the definition of a variable can consist of arbitrary expressions involving literals and *previously declared* variables and functions – rici May 28 '13 at 18:48
  • @rici interesting, but `a = a = 1` is a single statement in this case, which might explain why the dependency analysis is not very deep. It might be the same problem in my case, but then it means coma separated statements are semantically different from a statement list. That is precisely what I am trying to investigate ;) – Thibaut May 28 '13 at 18:51
  • @Thibaut: There's no question that your example attempts to use `a`, which is *not* a *previously declared* variable, within the initializer for `a`. That's independent of the comma operator. I don't know how you would isolate that issue from any other warnings. – rici May 28 '13 at 18:55
  • @rici but `a` will ALWAYS be initialized before I use it. The compiler can analyze the dependencies and clearly saw that in the first version of the code. It is only failing when using a coma-separated statement list, which, in my opinion, shouldn't be the case. – Thibaut May 28 '13 at 18:59

1 Answers1

1

5 Expressions

10 In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value expression. The expression is evaluated and its value is discarded

5.18 Comma operator [expr.comma]

A pair of expressions separated by a comma is evaluated left-to-right; the left expression is a discarded-value expression (Clause 5).83 Every value computation and side effect associated with the left expression is sequenced before every value computation and side effect associated with the right expression. The type and value of the result are the type and value of the right operand; the result is of the same value category as its right operand, and is a bit-field if its right operand is a glvalue and a bit-field.

It sounds to me like there's nothing wrong with your statement.

Looking more closely at the g++ warning, may be undefined, which tells me that the parser isn't smart enough to see that a=1 is guaranteed to be evaluated.

Steven Maitlall
  • 777
  • 3
  • 5
  • Thanks for the clarification, but if anything, this makes my point stronger: `side effect associated with the left expression is sequenced before every value computation and side effect associated with the right expression`, which seems to indicate they ARE full statements (or behave like), and the compiler should analyze them in the same manner, hence see the same dependencies. – Thibaut May 28 '13 at 19:10
  • I agree with you. It seems like you've gotten to a point where the implementation of the compiler, not the compiler's interpretation of the standard, is the source of the problem, if it can even be considered a problem. – Steven Maitlall May 28 '13 at 19:17
  • this is my suspicion from the start; but before I file a bug-report, I want to be absolutely sure there is NO problem on the standard side. This is indeed an implementation issue, but it is weird that both GCC and Clang don't behave as expected here. – Thibaut May 28 '13 at 19:20