57

Does this C99 code produce undefined behavior?

#include <stdio.h>

int main() {
  int a[3] = {0, 0, 0};
  a[a[0]] = 1;
  printf("a[0] = %d\n", a[0]);
  return 0;
}

In the statement a[a[0]] = 1; , a[0] is both read and modified.

I looked n1124 draft of ISO/IEC 9899. It says (in 6.5 Expressions):

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

It does not mention reading an object to determine the object itself to be modified. Thus this statement might produce undefined behavior.

However, I feel it strange. Does this actually produce undefined behavior?

(I also want to know about this problem in other ISO C versions.)

haccks
  • 104,019
  • 25
  • 176
  • 264
Masaki Hara
  • 3,295
  • 21
  • 21
  • 1
    No. It isn't an expression, per se. And it only modifies once, after reading twice. – B. Nadolson Aug 22 '15 at 04:41
  • 7
    What isn't an expression? And why `a[a[0]] = 1;` reads `a[0]` twice? It seems to read `a[0]` once. – Masaki Hara Aug 22 '15 at 04:50
  • How about `int index = a[0]; a[index] = 1;`? Then, you can also make sure that the index is within bounds of the array by replacing the second statement with `if(index<(sizeof(a)/sizeof(int))) a[index] = 1;` – Daniel M. Aug 22 '15 at 13:02
  • Why not try to run it? – Devesh Khandelwal Aug 22 '15 at 15:38
  • 6
    @DeveshKhandelwal; No. Just running and seeing output is not gonna help in this case. – haccks Aug 22 '15 at 16:09
  • @DeveshKhandelwal I think he wants to know if it's UB according to the standard (even though most UB scenarios are implementation-defined in practice) – Navin Aug 22 '15 at 17:31
  • No , I used a similar expression ( to store the location of a game piece ) and it worked perfectly. – Arif Burhan Mar 04 '16 at 19:35
  • @Navin: Many UB scenarios might be treated as defined by non-obtuse implementations, but some implementations are aggressive to the point of obtuseness. – supercat Oct 01 '16 at 23:24
  • @supercat That's BS, all compiler optimizers take advantage of UB. – Navin Oct 02 '16 at 03:15
  • @Navin: Given that some useful forms of optimization have nothing to do with UB (e.g. if unsigned 16-bit values are coerced to a 64-bit type and multiplied, a compiler which tracked them could use a 32-bit multiply which could be less than 1/4 the cost of a 64-bit one), and many forms of UB could be treated as defined without impeding any optimizations which would be very useful in many non-contrived circumstances, I'm not sure what your statement is supposed to mean. – supercat Oct 02 '16 at 16:23
  • A bit off-topic: you may be interested: [does `i = x[i]++;` lead to undefined behavior?](https://stackoverflow.com/q/71843405/1778275) – pmor Sep 11 '22 at 20:32

5 Answers5

52

the prior value shall be read only to determine the value to be stored.

This is a bit vague and caused confusion, which is partly why C11 threw it out and introduced a new sequencing model.

What it is trying to say is that: if reading the old value is guaranteed to occur earlier in time than writing the new value, then that's fine. Otherwise it is UB. And of course it is a requirement that the new value be computed before it is written.

(Of course the description I have just written will be found by some to be more vague than the Standard text!)

For example x = x + 5 is correct because it is not possible to work out x + 5 without first knowing x. However a[i] = i++ is wrong because the read of i on the left hand side is not required in order to work out the new value to store in i. (The two reads of i are considered separately).


Back to your code now. I think it is well-defined behaviour because the read of a[0] in order to determine the array index is guaranteed to occur before the write.

We cannot write until we have determined where to write. And we do not know where to write until after we read a[0]. Therefore the read must come before the write, so there is no UB.

Someone commented about sequence points. In C99 there is no sequence point in this expression, so sequence points do not come into this discussion.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 2
    Off the top of my head - in C11 , evaluating the operands is *sequenced before* the performing of an assignment, so it is not UB there. – M.M Aug 22 '15 at 06:24
  • Thanks! I understood the conclusion of the answer as that the ISO C99 specification has a little error and should be interpreted appropriately. Your answer seems to be based on deep understanding of ISO C standards, so I will recognize the conclusion that the ISO C99 has an error. – Masaki Hara Aug 22 '15 at 06:39
  • 5
    @MasakiHara; ISO C99 has no error regarding this except the statement is a bit vague. – haccks Aug 22 '15 at 07:03
  • 2
    I think you're wrong. From the quotation it is perfectly clear that `a[a[0]] = 1` does invoke undefined behavior. This may only seem pointless if one presumes a strictly sequential execution of CPU instructions where all side-effects of an instruction (including transient processes on electronic circuits) are complete prior to the start of execution of the next instruction. That holds for modern mainstream architectures. However, there have also been attempts to develop superscalar architectures where that mightn't be so. – ach Aug 22 '15 at 11:36
  • 5
    IMO if the statement in C99 is intended to mean what you say it's intended to mean then it's worse than "a bit vague", it's flawed because it doesn't authorise everything it intends to. "Determine the value to be stored" isn't vague as to whether or not it includes "determine the location to store the value": it doesn't include it. And the authors of C11 seem to broadly agree that C99 was wrong. OTOH, if compiler-writers universally interpret it as you say, then we at least have a de facto guarantee stronger than what the authors of C99 managed to actually write down :-) – Steve Jessop Aug 22 '15 at 11:44
  • @AndreyChernyakhovskiy I think the intent is what I have described, and the wording does not capture that intent very well – M.M Aug 22 '15 at 12:20
  • @MattMcNabb What you say may well be true, but my point is, what's written is written, and one *may* conceive of a *sensible* reason for it to be so written. – ach Aug 22 '15 at 14:39
  • *In C99 there is no sequence point in this expression, so sequence points do not come into this discussion.*: There is a sequence point at the end of this expression and just before it after the initializer. The main point is that the read order do not come into this discussion. In C11 the wording is different and instead of *sequence point* the word *unsequenced* is used and in this case one need to explain about the sequence of read operations. – haccks Aug 22 '15 at 15:13
  • What about `a[a[0]] = a[0]`? – jtbandes Aug 23 '15 at 06:25
  • @jtbandes I think that is also OK in C99 and C11, because the right-hand-side `a[0]` is still necessarily read before the assignment is performed – M.M Aug 23 '15 at 11:02
  • Unless `i` is volatile, I don't think anything in the Standard would require that `a[i]=1;` may only read `i` once. If `a` were a multi-word type, I don't think anything would forbid the compiler from reading `i` once while computing the address in which to store the lower word of `a[i]` and again when computing the address to store the upper word of `a[i]`. While it would seem unlikely that a compiler would do so, it's not implausible that implementations might exist where that would help code generation. – supercat Oct 01 '16 at 23:35
17

Does this C99 code produce undefined behavior?

No. It will not produce undefined behavior. a[0] is modified only once between two sequence points (first sequence point is at the end of initializer int a[3] = {0, 0, 0}; and second is after the full expression a[a[0]] = 1).

It does not mention reading an object to determine the object itself to be modified. Thus this statement might produce undefined behavior.

An object can be read more than once to modify itself and its a perfectly defined behavior. Look at this example

int x = 10;
x = x*x + 2*x + x%5;   

Second statement of the quote says:

Furthermore, the prior value shall be read only to determine the value to be stored.

All the x in the above expression is read to determine the value of object x itself.


NOTE: Note that there are two parts of the quote mentioned in the question. First part says: Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression., and
therefore the expression like

i = i++;

comes under UB (Two modifications between previous and next sequence points).

Second part says: Furthermore, the prior value shall be read only to determine the value to be stored., and therefore the expressions like

a[i++] = i;
j = (i = 2) + i;  

invoke UB. In both expressions i is modified only once between previous and next sequence points, but the reading of the rightmost i do not determine the value to be stored in i.


In C11 standard this has been changed to

6.5 Expressions:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. [...]

In expression a[a[0]] = 1, there is only one side effect to a[0] and the value computation of index a[0] is sequenced before the value computation of a[a[0]].

Community
  • 1
  • 1
haccks
  • 104,019
  • 25
  • 176
  • 264
  • 2
    This is the best answer as it is the only one which even mentions sequence points. I feel the others fail to recognize that there is a difference between "there is only one logical order that this can be evaluated" and "it is modified only once between two sequence points, and is therefore not UB". I've seen plenty of sequence point violations (which of course are UB) that appear to have only one reasonable mathematical interpretation – Evan Teran Aug 22 '15 at 05:54
  • 1
    Of course sequence point is to the point. I'm glad to see an answer that mentions sequence points. However, "is modified only once" is not sufficient. For example, `j = i + (i = 2);` is undefined (I think). The problem is when it is allowed to both modify and read the same object. – Masaki Hara Aug 22 '15 at 06:05
  • 4
    The standard says: reading the prior value **to determine the value to be stored** is OK. However, reading the prior value **to determine the object itself** is not mentioned. – Masaki Hara Aug 22 '15 at 06:12
  • @MasakiHara; The expression `j = i + (i = 2);` also invokes UB. But, to understand this you have to look at the second part of the section you have quoted in your answer: *Furthermore, the prior value shall be read only to determine the value to be stored.*, the reading of first `i` do not determine the value to be stored in `i`. `i` is modified by `i = 2`. – haccks Aug 22 '15 at 06:13
  • 2
    @haccks, yes, and that's why your example expression has defined behavior, as you mention in your answer. But the same is not true of the OP's expression. – John Bollinger Aug 22 '15 at 06:18
  • @JohnBollinger; I added some more explanation. – haccks Aug 22 '15 at 06:28
  • Is there anything wrong with this answer? Please let me know. – haccks Aug 22 '15 at 12:59
  • Does anything in the Standard mandate that a compiler may only evaluate a side-effect-free sub-expression once while evaluating an outer expression? I would think that, under the rules as written. a compiler would be allowed to evaluate the expression `a[0]` twice--once before e.g. writing the lower byte of `a[a[0]]` and then again after writing the lower half but before writing the upper half. Not that any quality implementation should do such a thing, but I don't think the Standard would forbid it (nor many other things that a conforming-but-useless implementation might do). – supercat Oct 03 '16 at 23:02
13

C99 presents an enumeration of all the sequence points in annex C. There is one at the end of

a[a[0]] = 1;

because it is a complete expression statement, but there are no sequence points inside. Although logic dictates that the subexpression a[0] must be evaluated first, and the result used to determine to which array element the value is assigned, the sequencing rules do not ensure it. When the initial value of a[0] is 0, a[0] is both read and written between two sequence points, and the read is not for the purpose of determining what value to write. Per C99 6.5/2, the behavior of evaluating the expression is therefore undefined, but in practice I don't think you need to worry about it.

C11 is better in this regard. Section 6.5, paragraph (1) says

An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.

Note in particular the second sentence, which has no analogue in C99. You might think that would be sufficient, but it isn't. It applies to the value computations, but it says nothing about the sequencing of side effects relative to the value computations. Updating the value of the left operand is a side effect, so that extra sentence does not directly apply.

C11 nevertheless comes through for us on this one, as the specifications for the assignment operators provide the needed sequencing (C11 6.5.16(3)):

[...] The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.

(In contrast, C99 just says that updating the stored value of the left operand happens between the previous and next sequence points.) With sections 6.5 and 6.5.16 together, then, C11 gives a well-defined sequence: the inner [] is evaluated before the outer [], which is evaluated before the stored value is updated. This satisfies C11's version of 6.5(2), so in C11, the behavior of evaluating the expression is defined.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • While the C++ standard has improved over C in this area, it too suffers from often appealing to (limited) human understanding of intention (as the phrase "to determine the value to be stored") rather than to a formal model. Take `a[++i]+=2` which looks perfectly defined. However the C++ Standard says [expr.ass] that the behaviour is equivalent to that of `a[++i]=a[++i]+2` (which has clearly undefined behavior) except that `++i` is evaluated only once (which removes the source of UB). So the behavior is equivalent to UB except that there is no UB; how's that? – Marc van Leeuwen Aug 23 '15 at 17:22
  • @MarcvanLeeuwen: The C standard regards `lvalue+=2;` as being equivalent to `lvalue=lvalue+2;` *except that any side-effects from determining the lvalue are only performed once*; I would expect C++ do be similar. – supercat Aug 26 '15 at 18:16
  • @supercat: Yes, C++ has this as well. My point is that if `lvalue=lvalue+2` has undefined behavior _because of the double side effect_ then this phrase is saying the behavior of `lvalue+=2` is equivalent to undefined behavior (therefore itself undefined) except that the reason for undefined behavior is removed. For me that does not specify any definite behavior. The main point that saying x is equivalent to y except that some detail z is different is an awfully bad way of specifying x, especially if y is undefined. – Marc van Leeuwen Aug 26 '15 at 19:15
  • @MarcvanLeeuwen: I don't see what you're complaining about. If behavior of `lvalue = lvalue + 2;` would be defined but for the fact that a side-effect occurs twice, why should it not preventing the double-side effect leave the behavior defined? – supercat Aug 26 '15 at 23:12
  • @supoercat Because undefined behavior means not defined at all. It is not as if there is a perfectly defined behavior underneath that we can recover if we lift the ban of UB; so "would be defined but for the fact that" makes no sense, formally. Humans can guess what the intention was and try to make sense of what the execution of the statement would be if one just try to remove the double side effect from the description (but which of the two?), but formally it means nothing. That is why I said "suffers from often appealing to human understanding of intention" in my first comment. – Marc van Leeuwen Aug 27 '15 at 05:04
  • As I said in [an other question](http://stackoverflow.com/a/24218798/1436796) where interpretation of this sloppy formulation actually became a real difficulty, it is like saying the following: "of the two twins, the first one died at childbirth; the youth of the second twin was exactly like that of the first, except that he did not die at childbirth". – Marc van Leeuwen Aug 27 '15 at 05:08
  • @MarcvanLeeuwen: I don't quite see the problem. Using an identity function f, given `i+=f(++i);`, the side-effects from `i++` must be fully processed before `f` is called and the entire read-modify-write of the left-hand side must occur after `f` returns. If `i` were a global variable, given `someFunction(i+=f(++i), g())` the call to `g` could occur before `i` was incremented, between the increment and the `+=` operation, or after the `+=` operation, but could not occur between the time the `+=` operator read the value and the time it wrote it back. – supercat Aug 27 '15 at 20:27
5

The value is well defined, unless a[0] contains a value that is not a valid array index (i.e. in your code is not negative and does not exceed 3). You could change the code to the more readable and equivalent

 index = a[0];
 a[index] = 1;    /* still UB if index < 0 || index >= 3 */

In the expression a[a[0]] = 1 it is necessary to evaluate a[0] first. If a[0] happens to be zero, then a[0] will be modified. But there is no way for a compiler (short of not complying with the standard) to change order of evaluations and modify a[0] before attempting to read its value.

Peter
  • 35,646
  • 4
  • 32
  • 74
  • I agree that the code can't be interpreted in other way normally. However, I can't find evidence in standard. `index = a[0]; a[index] = 1;` is no doubt valid, but I'm not sure if `a[a[0]] = 1` is equivalent to `index = a[0]; a[index] = 1;` . – Masaki Hara Aug 22 '15 at 05:39
  • 3
    It is. Given any valid expression of the the form `a[b]`, it is necessary to evaluate both the expression `a` and the expression `b` before `a[b]` can be evaluated. That logic is recursive. – Peter Aug 22 '15 at 05:44
  • 4
    "There is no other way to evaluate" does not imply that the code isn't undefined. What is undefined is individually stated in the standard. The word "shall" in the quotation (see above question) means that the behavior is undefined if the constraint is undefined. My question is why the code can be valid according to the standard nonetheless. – Masaki Hara Aug 22 '15 at 05:59
  • Now you're misquoting me, and then rebutting the misquote. Fill your boots. – Peter Aug 22 '15 at 13:41
  • 1
    @Peter: from reading the other answers, I think there's a pretty compelling argument that C99 isn't worded strongly enough, and this case might technically be undefined behaviour. Unless a compiler is intentionally malicious, though, there's only behaviour that makes any sense (evaluate the index before using it). This is why in practice this is not something to worry about, which other answers have also said. IIRC, "undefined behaviour" means literally anything is allowed to happen, which might allow a malicious compiler to violate the other rules that require the obvious ordering. – Peter Cordes Aug 22 '15 at 23:16
  • While I agree that C99 wasn't worded strongly enough, I see nothing in the previous posts that justifies a claim that this particular expression gives undefined behaviour. At most, they speculate about a set of circumstances that, if they happen, would result in undefined behaviour from this expression. They describe no manner in which that set of circumstances can actually occur within constraints of what is actually defined in the standard. In particular, they don't some circumvent the need of operator `[]` to evaluate both its operands before its result is computed. – Peter Aug 22 '15 at 23:47
  • 2
    @Peter, having undefined behavior is in this case a characteristic of the code, not in any way a function of the circumstances under which it is executed. That in practice you can expect compilers to produce code that does the expected thing is irrelevant. A conforming compiler *could* produce code that does literally anything, such as printing "shame on you!" to `stderr`, as the behavior of evaluating the expression. It would not for that reason fail to conform, though it would probably be unpopular. – John Bollinger Aug 26 '15 at 20:22
  • @JohnBollinger: In this case, behavior would be completely defined if a[0] holds any value other than zero. The most important question question would be whether an optimizer, given `if (a[0]) foo(); a[a[0]]=1;` would be entitled to make the call to `foo` unconditional on the basis that if `a[0]` was equal to zero it would be allowed to do anything it likes. – supercat Oct 02 '16 at 17:42
  • @supercat, if `a[0]` had a value other than `0` then its stored value would not be both read and modified by the expression in question. But the array designated by `a` is also an object, and *its* stored value would be both read and modified. In C99, that still produces undefined behavior according to the standard. It has well-defined behavior in C2011, however, because there the corresponding rule applies only to *scalar* objects, and also because the C2011 sequencing rules do establish an ordering. – John Bollinger Oct 02 '16 at 21:19
  • @JohnBollinger: The C Standard has always been sloppy with its use of the terms "object" and "access", in part because the authors of C89 did not think it necessary to specify every defined behavior that would "naturally" be supported by a compiler which met the spec, and because there would be no way of defining such terms that would not be incompatible with either significant parts of the code base, or with useful optimizations that some compilers were performing. The proper solution would be to recognize the existence of different compilation "modes"--any "fragmentation" of the language... – supercat Oct 03 '16 at 13:51
  • ...resulting from such recognition would pale in comparison with the fragmentation that has occurred because of the Committee's refusal to acknowledge the need for a split. I don't think I've previously seen anyone seriously suggest, however, that pointers to disjoint parts of the same array should be considered part of the same "object" for purposes of the rule you cite, though elements within an allocation should be considered part of the same "object" for purposes of memcpy/memmove. – supercat Oct 03 '16 at 13:52
  • @supercat, I'm just reading what the standard actually says. As I observed, however, that particular provision has been sharpened in the latest version of the standard to apply specifically to scalar objects. I'm inclined to think that the purpose of doing so is precisely to deal with the problem I described. And indeed it *is* a problem, for you're quite right that no one expects accesses to different scalar objects to run afoul of that provision simply as a result of them belonging to the same aggregate. But the standard says what it says. – John Bollinger Oct 03 '16 at 14:06
  • @JohnBollinger: The Standard also says that if an implementation can process one possibly-contrived program which exercises the implementation limits given in the Standard, the Standard imposes no other requirements with regard to what programs can be run without running afoul of implementation limits (including, but not limited to, stack space), nor with regard to what may occur if such limits are exceeded. The Standard was never intended to fully describe all the characteristics of a useful implementation on any particular platform. If an implementation running on a 48K CP/M machine... – supercat Oct 03 '16 at 14:24
  • ...dies when attempting 4-deep recursion of a function with 1024 bytes of automatic variables, that should not be considered a mark of low quality, but if an implementation running on a modern platform with 16 gigs were to do likewise, it would be. Perhaps the Standard was intended to give flexibility to platforms whose memory can only be read and written in large chunks (e.g. a machine might have a slow "fetch row" instruction which copies 32 bytes from magnetic core to a fast static-RAM buffer) but for machines with byte-writable memory, I doubt they intended that quality implementations... – supercat Oct 03 '16 at 14:29
  • ...shouldn't make such granular access freely available to programmers. – supercat Oct 03 '16 at 14:30
  • @supercat, sure, and exhibiting undefined behavior does not mean that the program must crash or behave unexpectedly. That the code presented exhibits undefined behavior when interpreted according to C99 is a language-lawyer issue, not necessarily relevant to what you should expect from a quality implementation. In fact, the relevant changes in C2011 indicate that undefined behavior in this area was *not* specifically intended or desired. But this is a language-lawyer question, so language-lawyering is appropriate. – John Bollinger Oct 03 '16 at 15:33
1

A side effect includes modification of an object1.

The C standard says that behavior is undefined if a side effect on object is unsequenced with a side effect on the same object or a value computation using the value of the same object2.

The object a[0] in this expression is modified (side effect) and it's value (value computation) is used to determine the index. It would seem this expression yields undefined behavior:

a[a[0]] = 1

However the text in assignment operators in the standard, explains that the value computation of both left and right operands of the operator =, is sequenced before the left operand is modified3.

The behavior is thus defined, as the first rule1 isn't violated, because the modification (side effect) is sequenced after the value computation of the same object.


1 (Quoted from ISO/IEC 9899:201x 5.1.2.3 Program Exectution 2):
Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.

2 (Quoted from ISO/IEC 9899:201x 6.5 Expressions 2):
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

3 (Quoted from ISO/IEC 9899:201x 6.5.16 Assignment operators 3):
The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.

2501
  • 25,460
  • 4
  • 47
  • 87