Order of evaluation of array indices (versus the expression) in C

Question

Looking at this code:

static int global_var = 0;

int update_three(int val)
{
    global_var = val;
    return 3;
}

int main()
{
    int arr[5];
    arr[global_var] = update_three(2);
}

Which array entry gets updated? 0 or 2?

Is there a part in the specification of C that indicates the precedence of operation in this particular case?

This smells of undefined behavior. It's certainly something that should never be purposely coded. — Fiddling Bits, Jan 13 '20 at 19:16
@Bob__ I'm not sure but I needed the 'static int' to get my code to actually run. — Jiminion, Jan 13 '20 at 19:51
That may be a symptom of other issues in your real code. Adding `static` doesn't change the [current](https://godbolt.org/z/PanVq3) one. — Bob__, Jan 13 '20 at 20:22
This has nothing to do with array indices or order of operations. It has to do with what the C spec calls "sequence points", and in particular, the fact that assignment expressions do NOT create a sequence point between the left-hand and right-hand expression, so the compiler is free to do as it chooses. — Lee Daniel Crocker, Jan 13 '20 at 21:20
You should report a feature request to `clang` so that this piece of code triggers a warning IMHO. — malat, Jan 14 '20 at 10:28
@Bob__ yes, you are correct. IIRC, a variable defined at the top level (same context as main()) is made static implicitly. — Jiminion, Jan 14 '20 at 15:32
@malat: That is an unreasonable expectation IMHO. The compiler can't be expected to analyse `update_three` for assignments to `global_var` just in case it might cause undefined behaviour. The programmer must take some responsibility! — TonyK, Jan 15 '20 at 10:56
@TonyK the compiler is already going to be doing the deeper work of considering if and how to inline this function. — Daniel McLaury, Jan 15 '20 at 13:44

Eric Postpischil · Accepted Answer · 2020-06-30T20:14:24.787

Order of Left and Right Operands

To perform the assignment in arr[global_var] = update_three(2), the C implementation must evaluate the operands and, as a side effect, update the stored value of the left operand. C 2018 6.5.16 (which is about assignments) paragraph 3 tells us there is no sequencing in the left and right operands:

The evaluations of the operands are unsequenced.

This means the C implementation is free to compute the lvalue arr[global_var] first (by “computing the lvalue,” we mean figuring out what this expression refers to), then to evaluate update_three(2), and finally to assign the value of the latter to the former; or to evaluate update_three(2) first, then compute the lvalue, then assign the former to the latter; or to evaluate the lvalue and update_three(2) in some intermixed fashion and then assign the right value to the left lvalue.

In all cases, the assignment of the value to the lvalue must come last, because 6.5.16 3 also says:

… The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands…

Sequencing Violation

Some might ponder about undefined behavior due to both using global_var and separately updating it in violation of 6.5 2, which says:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined…

It is quite familiar to many C practitioners that the behavior of expressions such as x + x++ is not defined by the C standard because they both use the value of x and separately modify it in the same expression without sequencing. However, in this case, we have a function call, which provides some sequencing. global_var is used in arr[global_var] and is updated in the function call update_three(2).

6.5.2.2 10 tells us there is a sequence point before the function is called:

There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call…

Inside the function, global_var = val; is a full expression, and so is the 3 in return 3;, per 6.8 4:

A full expression is an expression that is not part of another expression, nor part of a declarator or abstract declarator…

Then there is a sequence point between these two expressions, again per 6.8 4:

… There is a sequence point between the evaluation of a full expression and the evaluation of the next full expression to be evaluated.

Thus, the C implementation may evaluate arr[global_var] first and then do the function call, in which case there is a sequence point between them because there is one before the function call, or it may evaluate global_var = val; in the function call and then arr[global_var], in which case there is a sequence point between them because there is one after the full expression. So the behavior is unspecified—either of those two things may be evaluated first—but it is not undefined.

dbush · Answer 2 · 2020-01-15T13:55:08.573

26

The result here is unspecified.

While the order of operations in an expression, which dictate how subexpressions are grouped, is well defined, the order of evaluation is not specified. In this case it means that either global_var could be read first or the call to update_three could happen first, but there’s no way to know which.

There is not undefined behavior here because a function call introduces a sequence point, as does every statement in the function including the one that modifies global_var.

To clarify, the C standard defines undefined behavior in section 3.4.3 as:

undefined behavior

behavior, upon use of a nonportable or erroneous program construct or of erroneous data,for which this International Standard imposes no requirements

and defines unspecified behavior in section 3.4.4 as:

unspecified behavior

use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance

The standard states that the evaluation order of function arguments is unspecified, which in this case means that either arr[0] gets set to 3 or arr[2] gets set to 3.

edited Jan 15 '20 at 13:55

answered Jan 13 '20 at 19:38

dbush

205,898
23
218
273

“a function call introduces a sequence point” is insufficient. If the left operand is evaluated first, it suffices, since then the sequence point separates the left operand from the evaluations in the function. But, if the left operand is evaluated after the function call, the sequence point due to calling the function is not between the evaluations in the function and the evaluation of the left operand. You also need the sequence point that separates full expressions. – Eric Postpischil Jan 13 '20 at 19:49
2

@EricPostpischil In pre-C11 terminology there's a sequence point on entry and exit of a function. In C11 terminology the whole function body is indeterminately sequenced with respect to the calling context. These are both specifying the same thing, just using different terms – M.M Jan 13 '20 at 20:29
This is absolutely wrong. The order of evaluation of the arguments of the assignment is unspecified. As for the result of this particular assignment, it is the creation of an array with an unreliable content, both nonportable and intrinsically wrong (inconsistent with the semantics or either of the intended results). A perfect case of undefined behaviour. – kuroi neko Jan 15 '20 at 10:03
1

@kuroineko Just because the output can vary doesn't automatically make it undefined behavior. The standard has different definitions for undefined vs. unspecified behavior, and in this situation it is the latter. – dbush Jan 15 '20 at 12:35
@EricPostpischil You have sequence points here (from C11 informative Annex C): "Between the evaluations of the function designator and actual arguments in a function call and the actual call. (6.5.2.2)", "Between the evaluation of a full expression and the next full expression to be evaluated... /--/ ...the (optional) expression in a return statement (6.8.6.4)". And well, at each semicolon too, since that's a full expression. – Lundin Jan 15 '20 at 12:47
@kuroineko Reliance on unspecified behavior does just that: produce garbage values etc. But it doesn't do completely wild things like completely optimizing away the code or cause program crashes, like undefined behavior could do. – Lundin Jan 15 '20 at 12:49
@Lundin: Yes, the necessary sequence points exist. No, the statement “a function call introduces a sequence point” is an insufficient statement of that. The answer needed more details at the time I wrote the comment. It has since been edited. – Eric Postpischil Jan 15 '20 at 14:04

Mickael B. · Answer 3 · 2020-01-13T20:25:36.830

1

I tried and I got the entry 0 updated.

However according to this question: will right hand side of an expression always evaluated first

The order of evaluation is unspecified and unsequenced. So I think a code like this should be avoided.

edited Jan 13 '20 at 20:25

answered Jan 13 '20 at 19:26

Mickael B.

4,755
4
24
48

1

The behaviour is not undefined but unspecified. Naturally depending on either should be avoided. – Antti Haapala -- Слава Україні Jan 13 '20 at 20:20
1

Hmm ah and it is not unsequenced but indeterminately sequenced... 2 people standing randomly in a queue are indeterminately sequenced. Neo inside Agent Smith are unsequenced and undefined behaviour will happen. – Antti Haapala -- Слава Україні Jan 13 '20 at 20:28

score 0 · Answer 4 · edited Jan 22 '20 at 11:38

0

As it makes little sense to emit code for an assignment before you have a value to assign, most C compilers will first emit code that calls the function and save the result somewhere (register, stack, etc.), then they will emit code that writes this value to its final destination and therefore they will read the global variable after it has been changed. Let us call this the "natural order", not defined by any standard but by pure logic.

Yet in the process of optimization, compilers will try to eliminate the intermediate step of temporarily storing the value somewhere and try to write the function result as directly as possible to the final destination and in that case, they often will have to read the index first, e.g. to a register, to be able to directly move the function result to the array. This may cause the global variable to be read before it was changed.

So this is basically undefined behavior with the very bad property that its quite likely that the result will be different, depending on if optimization is performed and how aggressive this optimization is. It's your task as a developer to resolve that issue by either coding:

int idx = global_var;
arr[idx] = update_three(2);

or coding:

int temp = update_three(2);
arr[global_var] = temp;

As a good rule of the thumb: Unless global variables are const (or they are not but you know that no code will ever change them as a side effect), you should never use them directly in code, as in a multi-threaded environment, even this can be undefined:

int result = global_var + (2 * global_var);
// Is not guaranteed to be equal to `3 * global_var`!

Since the compiler may read it twice and another thread can change the value in between the two reads. Yet, again, optimization would definitely cause the code to only read it once, so you may again have different results that now also depend on the timing of another thread. Thus you will have a lot less headache if you store global variables to a temporary stack variable before usage. Keep in mind if the compiler thinks this is safe, it will most likely optimize even that away and instead use the global variable directly, so in the end, it may make no difference in performance or memory use.

(Just in case someone asks why would anyone do x + 2 * x instead of 3 * x - on some CPUs addition is ultra-fast and so is multiplication by a power two as the compiler will turn these into bit shifts (2 * x == x << 1), yet multiplication with arbitrary numbers can be very slow, thus instead of multiplying by 3, you get much faster code by bit shifting x by 1 and adding x to the result - and even that trick is performed by modern compilers if you multiply by 3 and turn on aggressive optimization unless it's a modern target CPU where multiplication is equally fast as addition since then the trick would slow down the calculation.)

edited Jan 22 '20 at 11:38

Farzad Karimi

770
1
12
31

answered Jan 13 '20 at 20:00

Mecki

125,244
33
244
253

2

It is not undefined behaviour - the standard lists possibilities and one of those is chosen at any instance – Antti Haapala -- Слава Україні Jan 13 '20 at 20:14
The compiler won't turn `3 * x` into two reads of x. It might read x once and then do the x + 2*x method on the register it read x into – M.M Jan 13 '20 at 20:24
@M.M I’ve never said it’s read 3 times. Why are you making up things? – Mecki Jan 13 '20 at 20:35
@AnttiHaapala If you cannot say what the result is by just looking at the code, the result is undefined and defined behavior results in defined results – Mecki Jan 13 '20 at 20:38
@M.M Where did I say that the compiler will turn 3 * x into two reads of x? – Mecki Jan 13 '20 at 20:39
In the last paragraph you say that the compiler might "perform this trick" if aggressive optimization is on, where the trick is introduced as writing `x + 2 * x` instead of `3 * x` – M.M Jan 13 '20 at 20:41
@M.M Yes, it performs the trick, I can prove that. It shifts and adds when your code says multiply by three. Where shall I show you the assembly output? I never said it will read anything twice, as it will use a register. The trick is to avoid multiplication. – Mecki Jan 13 '20 at 20:45
You didn't mention that it would use a register instead of reading twice as specified by the code you posted , that was why I added the comment to point out that it would – M.M Jan 13 '20 at 20:48
@M.M Once more, so that even you can understand it: I posted a code sample, that allows a C compiler to read a global variable twice. On CPUs without registers (e.g. stack based CPUs) it will most likely also do that as there are no registers and stack might be equally slow as RAM. I also said, when using optimization, it will most likely not read it twice if avoidable. And finally, in the last paragraph, I only explained why would anyone write such a code in the first place instead of just multiplying by 3 and that even compilers sometimes shift+add instead of multiplying. – Mecki Jan 14 '20 at 09:12
6

@Mecki _"If you cannot say what the result is by just looking at the code, the result is undefined"_ - [undefined behaviour](https://en.cppreference.com/w/cpp/language/ub) has a very specific meaning in C/C++, and that is not it. Other answerers have explained why this particular instance is _unspecified_, but not _undefined_. – marcelm Jan 14 '20 at 14:28
@marcelm *Undefined* has a very special meaning in the English language, it's *the opposite of defined*. I was not referring to any standard, was I? I never said that this is undefined behavior by any standard, did I? I said that the behavior of the code is not defined (that is the meaning of undefined), as if it was defined, then the outcome would be defined as well. – Mecki Jan 14 '20 at 16:47
3

I appreciate the intent to cast some light into the internals of a computer, even if that goes beyond the scope of the original question. However, UB is very precise C/C++ jargon and should be used carefully, especially when the question is about a language technicality. You might consider using the proper "unspecified behaviour" term instead, that would improve the answer significantly. – kuroi neko Jan 14 '20 at 20:04
@Mecki: If parts of the Standard and the documentation for an implementation and its runtime environment together specify the behavior of an action in some particular circumstance, but some other part of the Standard categorizes the action as "Undefined Behavior", should the latter be awarded unconditional priority? – supercat Jan 15 '20 at 03:46
2

@Mecki "_Undefined has a very special meaning in the English language_"... but in a question labelled `language-lawyer`, where the language in question has its _own_ "very special meaning" for _undefined_, you're only going to cause confusion by not using the language's definition. – TripeHound Jan 15 '20 at 08:11
@TripeHound: According to the authors of the Standard, "Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. *It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior.*" The question of what "popular extensions" to support is a Quality of Implementation issue outside the jurisdiction of the Standard. The authors of clang and gcc like to impose their own meaning which ignores the Committee's stated intention... – supercat Feb 06 '20 at 22:22
...that quality implementations should often support "popular extensions" whether or not the Standard compels them to do so, but I think the intention is quite clear from the published Rationale. – supercat Feb 06 '20 at 22:23
@supercat I don't think I disagree with what you've written, but am not sure why you're addressing me... my comment was purely to do with Mecki wanting to use an "everyday English" definition of "undefined behaviour" in response to a question where UB has a very specific meaning. – TripeHound Feb 06 '20 at 22:29

kuroi neko · Answer 5 · 2020-01-20T18:23:39.837

Global edit: sorry guys, I got all fired up and wrote a lot of nonsense. Just an old geezer ranting.

I wanted to believe C had been spared, but alas since C11 it has been brought on par with C++. Apparently, knowing what the compiler will do with side effects in expressions requires now to solve a little maths riddle involving a partial ordering of code sequences based on a "is located before the synchronization point of".

I happen to have designed and implemented a few critical real-time embedded systems back in the K&R days (including the controller of an electric car that could send people crashing into the nearest wall if the engine was not kept in check, a 10 tons industrial robot that could squash people to a pulp if not properly commanded, and a system layer that, though harmless, would have a few dozen processors suck their data bus dry with less than 1% system overhead).

I might be too senile or stupid to get the difference between undefined and unspecified, but I think I still have a pretty good idea of what concurrent execution and data access mean. In my arguably informed opinion, this obsession of the C++ and now C guys with their pet languages taking over synchronization issues is a costly pipe dream. Either you know what concurrent execution is, and you don't need any of these gizmos, or you don't, and you would do the world at large a favour not trying to mess with it.

All this truckload of eye-watering memory barrier abstractions is simply due to a temporary set of limitations of the multi-CPU cache systems, all of which can be safely encapsulated in common OS synchronization objects like, for instance, the mutexes and condition variables C++ offers.
The cost of this encapsulation is but a minute drop in performances compared with what a use of fine grained specific CPU instructions could achieve is some cases.
The volatile keyword (or a #pragma dont-mess-with-that-variable for all I, as a system programmer, care) would have been quite enough to tell the compiler to stop reordering memory accesses. Optimal code can easily be produced with direct asm directives to sprinkle low level driver and OS code with ad hoc CPU specific instructions. Without an intimate knowledge of how the underlying hardware (cache system or bus interface) works, you're bound to write useless, inefficient or faulty code anyway.

A minute adjustment of the volatile keyword and Bob would have been everybody but the most hardboiled low level programers' uncle. Instead of that, the usual gang of C++ maths freaks had a field day designing yet another incomprehensible abstraction, yielding to their typical tendency to design solutions looking for non existent problems and mistaking the definition of a programming language with the specs of a compiler.

Only this time the change required to deface a fundamental aspect of C too, since these "barriers" had to be generated even in low level C code to work properly. That, among other things, wrought havoc in the definition of expressions, with no explanation or justification whatsoever.

As a conclusion, the fact that a compiler could produce a consistent machine code from this absurd piece of C is only a distant consequence of the way C++ guys coped with potential inconsistencies of the cache systems of the late 2000s.
It made a terrible mess of one fundamental aspect of C (expression definition), so that the vast majority of C programmers - who don't give a damn about cache systems, and rightly so - is now forced to rely on gurus to explain the difference between a = b() + c() and a = b + c.

Trying to guess what will become of this unfortunate array is a net loss of time and efforts anyway. Regardless of what the compiler will make of it, this code is pathologically wrong. The only responsible thing to do with it is send it to the bin.
Conceptually, side effects can always be moved out of expressions, with the trivial effort of explicitly letting the modification occur before or after the evaluation, in a separate statement.
This kind of shitty code might have been justified in the 80's, when you could not expect a compiler to optimize anything. But now that compilers have long become more clever than most programmers, all that remains is a piece of shitty code.

I also fail to understand the importance of this undefined / unspecified debate. Either you can rely on the compiler to generate code with a consistent behaviour or you can't. Whether you call that undefined or unspecified seems like a moot point.

In my arguably informed opinion, C is already dangerous enough in its K&R state. A useful evolution would be to add common sense safety measures. For instance, making use of this advanced code analysis tool the specs force the compiler to implement to at least generate warnings about bonkers code, instead of silently generating a code potentially unreliable to the extreme.
But instead the guys decided, for instance, to define a fixed evaluation order in C++17. Now every software imbecile is actively incited to put side effects in his/her code on purpose, basking in the certainty that the new compilers will eagerly handle the obfuscation in a deterministic way.

K&R was one of the true marvels of the computing world. For twenty bucks you got a comprehensive specification of the language (I've seen single individuals write complete compilers just using this book), an excellent reference manual (the table of contents would usually point you within a couple of pages of the answer to your question), and a textbook that would teach you to use the language in a sensible way. Complete with rationales, examples and wise words of warning about the numerous ways you could abuse the language to do very, very stupid things.

Destroying that heritage for so little gain seems like a cruel waste to me. But again I might very well fail to see the point completely. Maybe some kind soul could point me in the direction of an example of new C code that takes a significant advantage of these side effects?

It is undefined behavior if there are side effects on the same object in the same expression, C17 6.5/2. These are unsequenced as per C17 6.5.18/3. But the text from 6.5/2 "If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined." does not apply, since the value computation inside the function is sequenced either before or after the array index access, regardless of the assignment operator having unsequenced operands in itself. — Lundin, Jan 15 '20 at 13:07
The function call acts kind of like "a mutex against unsequenced access", if you will. Similar to obscure comma operator crap like `0,expr,0`. — Lundin, Jan 15 '20 at 13:08
I think you believed the authors of the Standard when they said "Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: *the implementor may augment the language by providing a definition of the officially undefined behavior*." and said the Standard was not supposed to demean useful programs that weren't strictly conforming. I think most of the authors of the Standard would have thought it obvious that people seeking to write quality compilers... — supercat, Feb 06 '20 at 22:36
...should seek to use UB as an opportunity to make their compilers as useful as possible for their customers. I doubt any could have imagined that compiler writers would use it as an excuse to respond to complaints of "Your compiler processes this code less usefully than everyone else's" with "That's because the Standard doesn't require us to process it usefully, and implementations that usefully process programs whose behavior isn't mandated by the Standard merely promote the writing of broken programs". — supercat, Feb 06 '20 at 22:39
I fail to see the point in your remark. Relying on compiler-specific behaviour is a guarantee of non-portability. It also requires great faith in the compiler manufacturer, who could discontinue any of these "extra definitions" at any moment. The only thing a compiler can do is generate warnings, which a wise and knowledgeable programmer might decide to handle like errors. The problem I see with this ISO monster is that it makes such atrocious code as the OP's example legit (for extremely unclear reasons, compared with the K&R definition of an expression). — kuroi neko, Feb 07 '20 at 15:39

Order of evaluation of array indices (versus the expression) in C

5 Answers5

Order of Left and Right Operands

Sequencing Violation