39

I'm making my own C compiler to try to learn as much details as possible about C. I'm now trying to understand exactly how volatile objects work.

What is confusing is that, every read access in the code must strictly be executed (C11, 6.7.3p7):

An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously.134) What constitutes an access to an object that has volatile-qualified type is implementation-defined.

Example : in a = volatile_var - volatile_var;, the volatile variable must be read twice and thus the compiler can't optimise to a = 0;

At the same time, the order of evaluation between sequence point is undetermined (C11, 6.5p3):

The grouping of operators and operands is indicated by the syntax. Except as specified later, side effects and value computations of subexpressions are unsequenced.

Example : in b = (c + d) - (e + f) the order in which the additions are evaluated is unspecified as they are unsequenced.

But evaluations of unsequenced objects where this evaluation creates a side effect (with volatile for instance), the behaviour is undefined (C11, 6.5p2):

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.

Does this mean the expressions like x = volatile_var - (volatile_var + volatile_var) is undefined ? Should my compiler throw an warning if this occurs ?

I've tried to see what CLANG and GCC do. Neither thow an error nor a warning. The outputed asm shows that the variables are NOT read in the execution order, but left to right instead as show in the asm risc-v asm below :

const int volatile thingy = 0;
int main()
{
    int new_thing = thingy - (thingy + thingy);
    return new_thing;
}
main:
        lui     a4,%hi(thingy)
        lw      a0,%lo(thingy)(a4)
        lw      a5,%lo(thingy)(a4)
        lw      a4,%lo(thingy)(a4)
        add     a5,a5,a4
        sub     a0,a0,a5
        ret

Edit: I am not asking "Why do compilers accept it", I am asking "Is it undefined behavior if we strictly follow the C11 standard". The standard seems to state that it is undefined behaviour, but I need more precision about it to correctly interpret that

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
Elzaidir
  • 891
  • 6
  • 13
  • I think that intention of "a side effect on a scalar object" was changing the value of this object. So probably `int x = thingy + (thingy=42);` would be UB while `int x=thingy - (thingy + thingy)` would not. – tstanisl Jan 26 '23 at 14:40
  • 4
    `Should it be accepted` It's undefined behavior. You can do anything. Format them hard drives. But a warning would be nicer. – KamilCuk Jan 26 '23 at 14:43
  • 1
    @KamilCuk I'll make my compiler spawn dragons then, with a little warning before – Elzaidir Jan 26 '23 at 14:44
  • 3
    @KamilCuk I do not think you can do that a compile time, but you can make an executable doing that :-). - Seriously: a compiler is not required to detect undefined behavior constructs, so it is up to the compiler creator to determine if the compiler should detect this construct and throw a warning or even an error. Btw., writing code with undefined behavior is not illegal in any countries I have heard of and the C standard also permits it (but does not define the resulting behavior). – nielsen Jan 26 '23 at 14:52
  • 6
    @Elzaidir To rain further on your compiler-making parade, C23 changes the definition of a side effect slightly, as per [DR 476](https://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm#dr_476). C23 will say "An access to an object through the use of an lvalue of volatile-qualified type is a _volatile access_. A volatile access to an object, modifying an object, modifying a file, or calling a function that does any of those operations are all _side effects_" This is a very sound change though, that patches up all manner of language lawyer loop holes. – Lundin Jan 26 '23 at 15:13
  • Object code has nothing to do with the semantics of the language including sequence points & volatile access. "must strictly be executed" needs to be understood in terms of executions of the abstract machine. "the volatile variable must be read twice" What constitutes a volatile access is implementation-defined. So you need to say what (range of) definition you are considering in asking. "read in the execution order" The abstract machine execution IS the reading. What do YOU mean by "execution order"? PS Please don't add "EDIT"s, just edit to the best presentation possible at edit time. – philipxy Jan 28 '23 at 04:55
  • Meta-language wise, the issues with `volatile` are why C++ defined its own extensive memory model. Expecting `volatile` to be useful without your compiler adding extra meaning to it is unlikely. However, compilers *do* add extra meaning to it, so it is useful. – Yakk - Adam Nevraumont Jan 29 '23 at 14:23
  • @Lundin: JF Bastien is the C++ lead for Apple's clang front-end, but yeah, probably what he says is crap. He should do more research on Stack Overflow before he speaks at conferences. I removed my comment. – Thomas Weller Jan 30 '23 at 07:34
  • @ThomasWeller This post is about C. Also C++ programmers, including the C++ committee, seem hell-bent on making that language unsuitable for hardware-related programming. So they are unfortunately not a credible source in a discussion about hardware-related C programming. – Lundin Jan 30 '23 at 07:40

6 Answers6

22

Reading the (ISO 9899:2018) standard literally, then it is undefined behavior.

C17 5.1.2.3/2 - definition of side effects:

Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects

C17 6.5/2 - sequencing of operands:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.

Thus when reading the standard literally, volatile_var - volatile_var is definitely undefined behavior. Twice in a row UB actually, since both of the quoted sentences apply.


Please also note that this text changed quite a bit in C11. Previously C99 said, 6.5/2:

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

That is, the behaviour was previously unspecified in C99 (unspecified order of evaluation) but was made undefined by the changes in C11.


That being said, other than re-ordering the evaluation as it pleases, a compiler doesn't really have any reason to do wild and crazy things with this expression since there isn't much that can be optimized, given volatile.

As a quality of implementation, mainstream compilers seem to maintain the previous "merely unspecified" behavior from C99.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • That's interesting, how it is changing the semantics. For example, `volatile_var * 2` is not UB, while `volatile_var + volatile_var` is. – Eugene Sh. Jan 26 '23 at 15:53
  • @EugeneSh. There are probably more complex expressions where there might be an issue, but it would be too complicated to define the behavior so that these edge cases aren't included in the UB. – Barmar Jan 27 '23 at 16:15
  • Except that: What constitutes a volatile access is implementation-defined. This is chronically ignored when discussing program text in Q&A re volatile. – philipxy Jan 28 '23 at 05:04
  • @philipxy It's chronically ignored since it's far less interesting - conforming compilers _must_ define and document the behaviour. – pipe Jan 29 '23 at 00:09
  • Yes, that's what implementation-defined means. My point is, this answer is wrong when it says a particular piece of source has UB without giving assumed ID behaviour. And it fails to disabuse readers of their misconceptions about the semantics. – philipxy Jan 29 '23 at 02:13
  • Yes, accessing a `volatile` object is a side effect, which 5.1.2.3/2 goes on to explain is a change in the state of the execution environment. But for 6.5/2 to apply, it has to be a side effect *on that object* (or on some other scalar object), and the spec does not say that. Moreover, per 6.7.3/8, it is implementation-defined what constitutes an access to a volatile object in the first place. So *perhaps* `volatile_var` - `volatile_var` has UB, but it's much too strong to say that it *definitely* has UB. – John Bollinger Mar 16 '23 at 01:35
  • 1
    @JohnBollinger Well 6.5/2 would have to be about the relevant object used in the expression, as in one of the operands to one operator in the expression. As for 5.1.2.3 has a DR which was recognized by the Committee and implemented in C23. The text was changed to "An access to an object through the use of an lvalue of volatile-qualified type is a _volatile access_", removing the requirement that a certain object has to be accessed. I never even considered that problem, but before C23 things like `*(volatile int*)address` didn't actually count as a side effect. – Lundin Mar 16 '23 at 07:50
  • That doesn't really help, @Lundin, for even the latest C23 draft still makes it implementation-defined what constitutes an "access" to an object having `volatile`-qualified type (C23 6.7.3/8). And C23's version of 5.1.2.3 still does not specify that a volatile access to an object is necessarily a side effect *on that object*, as opposed to on the execution environment generally. I don't think that's an issue for writes, but the case is less clear for reads. – John Bollinger Mar 16 '23 at 13:42
13

Per C11, this is undefined behavior.

Per 5.1.2.3 Program execution, paragraph 2 (bolding mine):

Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects ...

And 6.5 Expressions, paragraph 2 (again, bolding mine):

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

Note that, as this is your compiler, you are free to define the behavior should you wish.

Lundin
  • 195,001
  • 40
  • 254
  • 396
Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • 3
    You infer it's UB, but it doesn't say so... It's up to the subjective interpretation - as with everything interesting in either C, C++, or C/C++. – curiousguy Jan 27 '23 at 22:29
  • 4
    @curiousguy "it doesn't say so" = "UB" – philipxy Jan 28 '23 at 05:01
  • 1
    @curiousguy It **does** say so: "If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined." – Andrew Henle Jan 29 '23 at 11:27
  • (cont) Note though "I'm making my own C compiler..." The OP is free to define the behavior as desired. Microsoft does just that with `volatile`: https://learn.microsoft.com/en-us/cpp/cpp/volatile-cpp?view=msvc-170 – Andrew Henle Jan 29 '23 at 11:35
  • Who says that a read of a volatile object is a side effect *on the variable read*? Who says that it is an access to that object at all? (See C17 6.7.3/8) – John Bollinger Mar 16 '23 at 01:42
  • @JohnBollinger What do you mean by a "side effect on an object"? – Andrew Henle Mar 16 '23 at 09:51
  • I mean exactly what the language spec means by that term in the section you quoted from paragraph 6.5/2. Whatever that is. I usually interpret it as meaning a side effect that changes the value of the object, but the spec throws that out the window for `volatile` objects by (1) making the meaning even of "access" implementation defined for those, and (2) specifying that *any* access of a volatile object, including reads if the implementation considers those to be accesses, are side effects (but on not saying *on what*, other than the execution context in general). – John Bollinger Mar 16 '23 at 13:24
  • @JohnBollinger *I mean exactly what the language spec means by that term in the section you quoted from paragraph 6.5/2. Whatever that is.* That phrase does not limit the extent or effect of "side effects", but I haven't been able to discern a clear limit to the definition of what a "side effect" is and is not from the standard. [This question](https://stackoverflow.com/questions/62148339/what-is-side-effect-in-c) doesn't really help much. I suspect the authors of the standard had in mind platform/hardware-specific effects such as accessing a `volatile` value changing the next value read. – Andrew Henle Mar 16 '23 at 20:18
  • I'm not sure I follow, @AndrewHenle. 6.5/2 absolutely limits which side effects are to be considered for its purposes: those that are *on* the same scalar object as each other, or *on* a scalar object whose value is also used in a relevant value computation. Even if we ignore 6.7.3/8, the question remains whether a read of a volatile object -- which definitely is a side effect -- is a side effect *on the object read*. Compare file modifications, which also are side effects, but not on any particular C object. – John Bollinger Mar 16 '23 at 20:32
  • @JohnBollinger *6.5/2 absolutely limits which side effects...* "1, 2, or 3 are integers" does not preclude 5 from being an integer. It puts no limits on integers at all. It just states a few examples. "Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects" is syntactically identical to "1, 2, or 3 are integers". IMO "side effect" is woefully underdefined, perhaps deliberately so. – Andrew Henle Mar 17 '23 at 10:14
  • You seem to be arguing that just because the spec doesn't say that a read of a volatile object is a side effect on that object doesn't mean it isn't one. I accept that, but I don't see your point. To conclude based on 6.5/2 that evaluating an expression such as the ones in the question produces UB, you need side effects on the volatile object. The best you can get from the spec is *maybe* UB, but this answer make the unqualified statement " this is undefined behavior". – John Bollinger Mar 17 '23 at 12:41
13

As other answers have pointed out, accessing a volatile-qualified variable is a side effect, and side effects are interesting, and having multiple side effects between sequence points is especially interesting, and having multiple side effects that affect the same object between sequence points is undefined.

As an example of how/why it's undefined, consider this (wrong) code for reading a two-byte big-endian value from an input stream ifs:

uint16_t val = (getc(ifs) << 8) | getc(ifs);     /* WRONG */

This code imagines (in order to implement big-endianness, that is) that the two getc calls happen in left-to-right order, but of course that's not at all guaranteed, which is why this code is wrong.

Now, one of the things the volatile qualifier is for is input registers. So if you've got a volatile variable

volatile uint8_t inputreg;

and if every time you read it you get the next byte coming in on some device — that is, if merely accessing the variable inputreg is like calling getc() on a stream — then you might write this code:

uint16_t val = (inputreg << 8) | inputreg;       /* ALSO WRONG */

and it's just about exactly as wrong as the getc() code above.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • There's no need for it to be undefined *behaviour*; an unspecified order of evaluation would suffice to give compilers the same freedom when compiling this expression to asm. (Which most programs wouldn't want to take advantage of; it's still at best inefficient to write code like this if reads actually do have some kind of side effect, or even just if they stop the compiler from loading once into a register if you do expect each read to get the same value). Making it UB doesn't make it easier to debug, potentially the opposite as optimizing away other code in the same block can be confusing. – Peter Cordes Jan 28 '23 at 15:52
  • IMO the easiest-to-debug behaviour would be to warn about unsequenced volatile accesses, and to pick some order, e.g. left to right or whatever's convenient for the parse tree. Or pick an order that leads to most efficient asm, e.g. on Sandybridge-family doesn't have partial register stalls for low-8 regs, `movzx eax, byte [inputreg]` / `shl eax, 8` / `mov al, [inputreg]`. On AMD or Silvermont where high-8 regs aren't renamed separately either, `movzx eax, byte [inputreg]` / `mov ah, [inputreg]`. (GCC misses optimizations allowed by choice of eval order of unsequenced things.) – Peter Cordes Jan 28 '23 at 16:01
  • Making it UB makes more sense when you include side-effects that modify an object. But it makes sense to say nobody should ever write code like this, so we'll leave it fully undefined. – Peter Cordes Jan 28 '23 at 16:04
  • I'd assume that `(getc(ifs) << 8) || getc(ifs)` would not be undefined because of short-circuit evaluation? Not that it would make sense to combine bits with a logical operator. So does it mean `a | b` the evaluation of `a` and `b` is not sequenced BUT `a || b` is sequenced? – Marco Jan 29 '23 at 14:01
6

The Standard has no terminology more specific than "Undefined Behavior" to describe actions which should be unambiguously defined on some implementations, or even the vast majority of them, but may behave unpredictably on others, based upon Implementation-Defined criteria. If anything, the authors of the Standard go out of their way to avoid saying anything about such behaviors.

The term is also used as a catch-all for situations where a potentially useful optimization might observably affect program behavior in some cases, to ensure that such optimizations will not affect program behavior in any defined situations.

The Standard specifies that the semantics of volatile-qualified accesses are "Implementation Defined", and there are platforms where certain kinds of optimizations involving volatile-qualified accesses might be observable if more than one such access occurs between sequence points. As a simple example, some platforms have read-modify-write operations whose semantics may be observably distinct from doing discrete read, modify, and write operations. If a programmer were to write:

void x(int volatile *dest, int volatile *src)
{
  *dest = *src | 1;
}

and the two pointers were equal, the behavior of such a function might depend upon whether a compiler recognized that the pointers were equal and replaced discrete read and write operations with a combined read-modify-write.

To be sure, such distinctions would be unlikely to matter in most cases, and would be especially unlikely to matter in cases where an object is read twice. Nonetheless, the Standard makes no attempt to distinguish situations where such optimizations would actually affect program behavior, much less those where they would affect program behavior in any way that actually mattered, from those where it would be impossible to detect the effects of such optimization. The notion that the phrase "non-portable or erroneous" excludes constructs which would be non-portable but correct on the target platform would lead to an interesting irony that compiler optimizations such as read-modify-write merging would be completely useless on any "correct" programs.

supercat
  • 77,689
  • 9
  • 166
  • 211
3

No diagnostic is required for programs with Undefined Behaviour, except where specifically mentioned. So it's not wrong to accept this code.

In general, it's not possible to know whether the same volatile storage is being accessed multiple times between sequence points (consider a function taking two volatile int* parameters, without restrict, as the simplest example where analysis is impossible).

That said, when you are able to detect a problematic situation, users might find it helpful, so I encourage you to work on getting a diagnostic out.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
0

IMO it is legal but very bad.

    int new_thing = thingy - (thingy + thingy);

Multiple use of volatile variables in one expression is allowed and no warning is needed. But from the programmer's point of view, it is a very bad line of code.

Does this mean the expressions like x = volatile_var - (volatile_var + volatile_var) is undefined ? Should my compiler throw an error if this occurs ?

No as C standard does not say anything how those reads have to be ordered. It is left to the implementations. All known to me implementations do it the easiest way for them like in this example : https://godbolt.org/z/99498141d

0___________
  • 60,014
  • 4
  • 34
  • 74
  • 4
    Why is it legal? There are two side effects unsequenced in one expression, that's UB. `how those reads have to be ordered` Exactly. And `C11, 6.5p2` – KamilCuk Jan 26 '23 at 14:28
  • No - it is left to the implementation and it is not Undefined Behaviour which has a very specific meaning in C. – 0___________ Jan 26 '23 at 14:29
  • 1
    `Except as specified later, side effects and value computations of subexpressions are unsequenced.` - it's unsequenced. It's not "implementation-defined sequenced". – KamilCuk Jan 26 '23 at 14:32
  • 1
    @0___________ Not true. [**5.1.2.3 Program execution**, p2](https://port70.net/~nsz/c/c11/n1570.html): "Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects" and [**6.5 Expressions**, p2](https://port70.net/~nsz/c/c11/n1570.html#6.5p2): "If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, **the behavior is undefined**." – Andrew Henle Jan 26 '23 at 14:32
  • 4
    @KamilCuk: It is legal because the C standard does not say a program cannot do it. The C standard says the behavior is undefined. Programs are **allowed** to do all sorts of things that have behavior not defined by the C standard. For example, they can link to modules written in other languages, which is not defined by the C standard. They can call operating system routines, which is not defined by the C standard. They can use compiler extensions, which are not defined by the C standard. All of things are legal; they are not disallowed by the C standard. – Eric Postpischil Jan 26 '23 at 14:48
  • 4
    @AndrewHenle: True. The fact the behavior is undefined does not mean a program cannot do it or that a compiler cannot define it. It only means the C standard is **silent** about what happens. The C standard does not impose any requirement that a program cannot do things that have undefined behavior nor that a C implementation must not define them—those would be requirements, but the C standard defines “undefined behavior” to mean that it, the standard, does not impose requirements. – Eric Postpischil Jan 26 '23 at 14:50
  • 6
    @AndrewHenle: When writing portable code, people need to avoid undefined behavior, because that is not portable. But some people take that further and think there is a principle that you must always avoid undefined behavior. That morphs into thinking that undefined behavior is not allowed. That is incorrect. Nothing in the C standard says that. The C standard is deliberately written to allow extensions, including extensions that define behavior the C standard does not. – Eric Postpischil Jan 26 '23 at 14:51
  • 4
    Indeed, I have formulated my question in a way that may imply that undefined behaviours are illegals, which they are not. I should have ask if my compiler should warn about undefined behaviour or not instead – Elzaidir Jan 26 '23 at 15:01