27

I am reading the end of the 2nd chapter of K&R and I'm having some difficulty understanding two specific unrelated example lines of code (which follow) along with commentary of them in the book:

x = f() + g();
a[i] = i++;

FIRST LINE - I have no trouble understanding that the standard does not specify the order of evaluation for the + operator, and that therefore it is unspecified whether f() or g() evaluates first (and that is why I think the question isn't a duplicate). My confusion stems from the fact that if we look up the C operator precedence chart it cites function calls as of highest precedence with left-to-right associativity. Now doesn't that mean that f() has to be called/evaluated before g()? Obviously not, but I don't know what I am missing.

SECOND LINE - Again the similar conundrum regarding whether the array is indexed to the initial value of i or the incremented value. However, again the operator precedence chart cites array subscripting as of highest precedence with left-to-right associativity. Therefore wouldn't array subscripting be the first thing to be evaluated causing the array to be subscripted to the initial value of i and removing any unambiguity? Obviously not, and I'm missing something.

I do understand that compilers have the freedom to decide when side effects happen in an expression (between sequence points of course) and that that may cause undefined behaviour if the variable in question is used again in the same expression, however in the examples above it seems that any ambiguity is cleared by function calls and array subscripting having highest precedence and defined left-to-right associativity, so I fail to see the ambiguity.

I have a feeling that I have some fundamental misconception about the concepts of associativity, operator precedence and order of evaluation, but I can't point my finger on what it is, and similar questions/answers on this topic were out of my league to understand thoroughly at this point.

Mark Amery
  • 143,130
  • 81
  • 406
  • 459
bonehead
  • 392
  • 2
  • 8
  • 2
    Related: [Why are these constructs using pre and post-increment undefined behavior?](https://stackoverflow.com/questions/949433/why-are-these-constructs-using-pre-and-post-increment-undefined-behavior) – S.S. Anne Aug 13 '19 at 20:26
  • As a general rule, if there's no requirement that something *has* to happen, then the behaviour is termed "undefined" and anything can and will happen instead. `f()` and `g()` will both be evaluated prior to the addition being performed. That much is assured, but as to the order, that doesn't matter and shouldn't impact your code. Same goes for arguments. When in doubt consult the C standard you're working with, whatever that is. – tadman Aug 13 '19 at 20:26
  • https://stackoverflow.com/questions/5473107/operator-precedence-vs-order-of-evaluation?rq=1 – Mat Aug 13 '19 at 20:32
  • 2
    For extra confusion: as the order of evaulation of `f()` and `g()` are undefined, an advanced compile targetting a multi-core system is technically free to run them both at once. I don't know of any compiler that does this sort of automatic threading but it is at least possible. As a general note: this is all a good argument against using tricks like pre-/post-increment for optimising code conciseness or resulting run-time: you can potentially confuse the reader of your code (i.e. *you*, a few months from now!) and a good modern compiler will affect the same optimisation automatically anyway. – David Spillett Aug 14 '19 at 09:37

5 Answers5

21

FIRST LINE

The left-to-right associativity means that an expression such as f()()() is evaluated as ((f())())(). The associativity of the function call operator () says nothing about its relationship with other operators such as +.

(Note that associativity only really makes sense for nestable infix operators such as binary +, %, or ,. For operators such as function call or the unary ones, associativity is rather pointless in general.)

SECOND LINE

Operator precedence affects parsing, not order of evaluation. The fact that [] has higher precedence than = means that the expression is parsed as (a[i]) = (i++). It says very little about evaluation order; a[i] and i++ must both be evaluated before the assignment, but nothing is said about their order with respect to each other.

To hopefully clear up confusion:

Associativity controls parsing and tells you whether a + b + c is parsed as (a + b) + c (left-to-right) or as a + (b + c) (right-to-left).

Precedence also controls parsing and tells you whether a + b * c is parsed as (a + b) * c (+ has higher precedence than *) or as a + (b * c) (* has higher precedence than +).

Order of evaluation controls which values need to be evaluated in which order. Parts of it can follow from associativity or precedence (an operand must be evaluated before it's used), but it's seldom fully defined by them.

Angew is no longer proud of SO
  • 167,307
  • 17
  • 350
  • 455
  • 4
    Associativity is not relevant to `f()()()`. There is no other way to structure the expression, so there is no choice for associativity to resolve. Associativity applies in a situation like `a-b+c` where there could (based on precedence alone) be a choice between `(a-b)+c` or `a-(b+c)`. (And the C rules actually specify expression structure or parsing by grammar, not by explicit precedence or associativity direction.) – Eric Postpischil Aug 13 '19 at 22:34
  • 1
    @EricPostpischil Yes, I know. However, many (most?) C[++] reference pages, tutorials, etc. still describe associativity for operators such as function call or unaries, even if it makes no sense. It therefore has to be handled in a beginner-oriented answer *somehow.* Notice also the note about associativity being pointless in some cases; I will extend it to cover more than just unaries. – Angew is no longer proud of SO Aug 14 '19 at 07:07
  • Is `a[i]` really evalulated? The value stored at that memory location before the assignment happens is certainly never read. – pbfy0 Aug 14 '19 at 15:58
  • 1
    @pbfy0, the assignment needs to know the location (memory address) where to store the value assigned. That requires evaluating `a[i]`, even if it doesn't use the actual value stored there. If you have something like `*(p + f(x)) = g(x)` it should be clear that there's something to evaluate on the LHS too (and still no guarantees about which of `f()` and `g()` is called first). – ilkkachu Aug 14 '19 at 16:09
  • 1
    @ilkkachu This might not be the terminology that the C standard uses, but to me that seems more like evaluating `&a[i]`. – pbfy0 Aug 14 '19 at 16:26
7
  1. It's not really meaningful to say that function calls have left-to-right associativity, and even if it were meaningful, this would only apply to exotic combinations where two function-call operators were being applied right next to each other. It wouldn't say anything about two separate function calls on either side of a + operator.
  2. Precedence and associativity don't help us at all in the expression a[i] = i++. There simply is no rule that says precisely when within an expression i++ stores the new result back into i, meaning that there is no rule to tell us whether the a[i] part uses the old or the new value. That's why this expression is undefined.

Precedence tells you what happens when you have two different operators that might apply. In a + b * c, does the + or the * apply first? In *p++, does the * or the ++ apply first? Precedence answers these questions.

Associativity tells you what happens when you have two of the same operators that might apply (generally, a string of the same operators in a row). In a + b + c, which + applies first? That's what associativity answers.

But the answers to these questions (that is, the answers supplied by the precedence and associativity rules) apply rather narrowly. They tell you which of the two operators you were wondering about apply first, but they do not tell you much of anything about the bigger expression, or about the smaller subexpressions "underneath" the operators you were wondering about. (For example, if I wrote (a - b) + (c - d) * (e - f), there's no rule to say which of the subtractions happens first.)

The bottom line is that precedence and associativity do not fully determine order of evaluation. Let's say that again in a slightly different way: precedence and associativity partially determine the order of evaluation in certain expressions, but they do not fully determine the order of evaluation in all expressions.

In C, some aspects of the order of evaluation are unspecified, and some are undefined. (This is by contrast to, as I understand it, Java, where all aspects of evaluation order are defined.)

See also this answer which, although it's about a different question, explains the same points in more detail.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
5

Precedence and associativity matter when an expression has more than one operator.

Associativity doesn't matter with addition, because as you may remember from grade school math, addition is commutative and associative -- there's no difference between (a + b) + c, a + (b + c), or (b + c) + a (but see the Note at the end of my answer).

But consider subtraction. If you write

100 - 50 - 5

it matters whether you treat this as

(100 - 50) - 5 = 45

or

100 - (50 - 5) = 55

Left associativity means that the first interpretation will be used.

Precedence comes into play when you have different operators, e.g.

10 * 20 + 5

Since * has higher precedence than +, this is treated like

(10 * 20) + 5 = 205

rather than

10 * (20 + 5) = 250

Finally, order of evaluation is only noticeable when there are side effects or other dependencies between the sub-expressions. If you write

x = f() - g() - h()

and these functions each print something, the language doesn't specify the order in which the output will occur. Associativity doesn't change this. Even though the results will be subtracted in left-to-right order, it could call them in a different order, save the results somewhere, and then subtract them in the correct order. E.g. it could act as if you'd written:

temp_h = h();
temp_f = f();
temp_g = g();
x = (temp_f - temp_g) - temp_h;

Any reordering of the first 3 lines would be allowed as an interpretation.

Note

Note that in some cases, computer arithmetic is not exactly like real arithmetic. Numbers in computers generally have limited range or precision, so there can be anomalous results (e.g. overflow if the result of addition is too large). This could cause different results depending on the order of operations even with operators that are theoretically associative, e.g. mathematically the following two expressions are equivalent:

x + y - z = (x + y) - z
y - z + x = (y - z) + x

But if x + y overflows, the results can be different. Use explicit parentheses to override the default associativity if necessary to avoid a problem like this.

Barmar
  • 741,623
  • 53
  • 500
  • 612
4

Regarding your first question:

x = f() + g();

The left-to-right associativity relates to operators at the same level that are directly grouped together. For example:

x = a + b - c;

Here the + and - operators have the same precedence level, so a + b is first evaluated, then a + b - c.

For an example more related to yours, imagine a function that returns a function pointer. You could then do something like this:

x()();

In this case, the function x must be called first, then the function pointer returned by x is called.

For the second:

a[i] = i++;

The side effect of the postincrement operator is not guaranteed to occur until the next sequence point. Because there are no sequence points in this expression, the i on the left side may be evaluated before or after the side effect of ++. This invokes undefined behavior due to both reading and writing a variable without a sequence point.

dbush
  • 205,898
  • 23
  • 218
  • 273
2

FIRST LINE - Associativity is not relevant here. Associativity only really comes into play when you have a sequence of operators with the same precedence. Let's take the expression x + y - z. The additive operators + and - are left-associative, so that sequence is parsed as (x + y) - z - IOW, the result of z is subtracted from the result of x + y.

THIS DOES NOT MEAN that any of x, y, or z have to be evaluated in any particular order. It does not mean that x + y must be evaluated before z. It only means that the result of x + y must be known before the result of z is subtracted from it.

Regarding x = f() + g(), all that matters is that the results of f() and g() are known before they can be added together - it does not mean that f() must be evaluated before g(). And again, associativity has no effect here.

SECOND LINE - This statement invokes undefined behavior precisely because the order of operations is unspecified (strictly speaking, the expressions a[i] and i++ are unsequenced with respect to each other). You cannot both update an object (i++) and use its value in a computation (a[i]) in the same expression without an intervening sequence point. The result will not be consistent or predictable from build to build (it doesn't even have to be consistent from run to run of the same build). Expressions like a[i] = i++ (or a[i++] = i) and x = x++ all have undefined behavior, and the result can be quite literally anything.

Note that the &&, ||, ?:, and comma operators do force left-to-right evaluation and introduce sequence points, so an expression like

i++ && a[i]

is well-defined - i++ will be evaluated first and its side effect will be applied before a[i] is evaluated.

Precedence and associativity fall out of the language grammar - for example, the grammar for the additive operators + and - is

additive-expression:
    multiplicative-expression
    additive-expression + multiplicative-expression
    additive-expression - multiplicative-expression

IOW, an additive-expression can produce a single multiplicative-expression, or it can produce another additive-expression followed by an additive operator followed by a multiplicative-expression. Let's see how this plays out with x + y - z:

x -- additive-expression ---------+
                                  |
+                                 +-- additive-expression --+
                                  |                         |
y -- multiplicative-expression ---+                         |
                                                            +-- additive-expression
-                                                           |
                                                            |
z -- multiplicative-expression -----------------------------+

You can see that x + y is grouped together into an additive-expression first, and then that expression is grouped with z to form another additive-expression.

John Bode
  • 119,563
  • 19
  • 122
  • 198