Note that since the definition of expression is
expression:
assignment-expression
expression ,
assignment-expression
the second line means that any assignment-expression can be considered an expression, which is why t=3, t+2
is a valid expression.
So why is the grammar this way? First note that the grammar for expressions builds its way in steps from the most tightly bound category primary-expression to the least tightly bound category expression. (And then the fact that "(
expression )
" is a primary-expression brings the expression grammar full circle and lets us cause any expression to be more tightly bound than everything that surrounds it by adding parentheses.)
For example, the well-known fact that binary *
binds tighter than binary +
follows from these grammar pieces:
multiplicative-expression:
pm-expression
multiplicative-expression *
pm-expression
multiplicative-expression /
pm-expression
multiplicative-expression %
pm-expression
additive-expression:
multiplicative-expression
additive-expression +
multiplicative-expression
additive-expression -
multiplicative-expression
In the expression 2 + 3 * 4
, the literals 2
, 3
, and 4
can be considered a pm-expression, or therefore also a multiplicative-expression or additive-expression. So you might say 2 + 3
would qualify as an additive-expression, but it is not a multiplicative-expression, so the full 2 + 3 * 4
can't work that way. Instead the grammar forces 3 * 4
to be considered a multiplicative-expression, so that 2 + 3 * 4
can be an additive-expression. Therefore 3 * 4
is a subexpression of the binary +
.
Or in the expression 2 * 3 + 4
, 3 + 4
might be considered an additive-expression, but then it is not a pm-expression, so that doesn't work. Instead the parser must recognize that 2 * 3
is a multiplicative-expression, which is also an additive-expression, so 2 * 3 + 4
is a valid additive-expression, with 2 * 3
as a subexpression of the binary +
.
The recursive nature of most grammar definitions matters when the same operator is used twice, or two operators with the same precedence are used.
Going back to the comma grammar, if we have the tokens "a, b, c
", we might say b, c
could be an expression, but it is not an assignment-expression, so b, c
cannot be a subexpression of the whole. Instead the grammar requires recognizing a, b
as an expression, which is allowed as a left subexpression of another comma operator, so a, b, c
is also an expression with a, b
as the left operand.
This doesn't make any difference for the built-in comma, since its meaning is associative: "evaluate and discard a
, then the result value comes from evaluating (evaluate and discard b
, then the result value comes from evaluating c
)" does the same as "evaluate and discard (evaluate and discard a
, then the result value comes from evaluating b
), then the result value comes from evaluating c
".
But it does give us a clearly-defined behavior in case of an overloaded operator,
. Given:
struct X {};
X operator,(X, X);
X a, b, c;
X d = (a, b, c);
we know that the last line means
X d = operator,(operator,(a,b), c);
and not
X d = operator,(a, operator,(b,c));
(I'd consider it particularly evil to define a non-associative operator,
, but it is allowed.)