30

To show the topic I'm going to use C, but the same macro can be used also in C++ (with or without struct), raising the same question.

I came up with this macro

#define STR_MEMBER(S,X) (((struct S*)NULL)->X, #X)

Its purpose is to have strings (const char*) of an existing member of a struct, so that if the member doesn't exist, the compilation fails. A minimal usage example:

#include <stdio.h>

struct a
{
    int value;
};

int main(void)
{
    printf("a.%s member really exists\n", STR_MEMBER(a, value));
    return 0;
}

If value weren't a member of struct a, the code wouldn't compile, and this is what I wanted.

The comma operator should evaluate the left operand and then discard the result of the expression (if there is one), so that my understanding is that usually this operator is used when the evaluation of the left operand has side effects.

In this case, however, there aren't (intended) side effects, but of course it works iff the compiler doesn't actually produce the code which evaluates the expression, for otherwise it would access to a struct located at NULL and a segmentation fault would occur.

Gcc/g++ 6.3 and 4.9.2 never produced that dangerous code, even with -O0, as if they were always able to “see” that the evaluation hasn't side effects and so it can be skipped.

Adding volatile in the macro (e.g. because accessing that memory address is the desired side effect) was so far the only way to trigger the segmentation fault.

So the question: is there anything in the C and C++ languages standard which guarantees that compilers will always avoid actual evaluation of the left operand of the comma operator when the compiler can be sure that the evaluation hasn't side effects?

Notes and fixing

I am not asking for a judgment about the macro as it is and the opportunity to use it or make it better. For the purpose of this question, the macro is bad if and only if it evokes undefined behaviour — i.e., if and only if it is risky because compilers are allowed to generate the “evaluation code” even when this hasn't side effects.

I have already two obvious fixes in mind: “reifying” the struct and using offsetof. The former needs an accessible memory area as big as the biggest struct we use as first argument of STR_MEMBER (e.g. maybe a static union could do…). The latter should work flawlessly: it gives an offset we aren't interested in and avoids the access problem — indeed I'm assuming gcc, because it's the compiler I use (hence the tag), and that its offsetof built-in behaves.

With the offsetof fix the macro becomes

#define STR_MEMBER(S,X) (offsetof(struct S,X), #X)

Writing volatile struct S instead of struct S doesn't cause the segfault.

Suggestions about other possible “fixes” are welcome, too.

Added note

Actually, the real usage case was in C++ in a static storage struct. This seems to be fine in C++, but as soon as I tried C with a code closer to the original instead of the one boiled for this question, I realized that C isn't happy at all with that:

error: initializer element is not constant

C wants the struct to be initializable at compile time, instead C++ it's fine with that.

ShinTakezou
  • 9,432
  • 1
  • 29
  • 39
  • 2
    Since you have to ask this question, it's probably a good idea to just not rely on it whether or not the expression is guaranteed not to execute. Future readers of your code / your coworkers / future you (while debugging) are likely to not know whether this is valid. – Justin Sep 20 '17 at 21:00
  • *everything* is defined in terms of the "as-if" rule, applied to the abstract machine defined in the standard – o11c Sep 20 '17 at 21:00
  • Do note that accessing a member of a null pointer is undefined behavior. That allows the compiler to do whatever it wants. – NathanOliver Sep 20 '17 at 21:02
  • I think the whole question can be rephrased as "Is `(struct S*)NULL)->value;` line UB?" The answer is Yes, I believe... – Eugene Sh. Sep 20 '17 at 21:02
  • 4
    Why not use `sizeof((struct S *)0->X)`; you know `sizeof()` doesn't evaluate its operand, but it would fail if `X` is not a member of `struct S`. – Jonathan Leffler Sep 20 '17 at 21:03
  • And dereferencing nullptr **is** UB. – Jarod42 Sep 20 '17 at 21:03
  • 2
    In C++, you might write traits to know if `A::value` exists, see [`std::experimental::is_detected`](http://en.cppreference.com/w/cpp/experimental/is_detected). – Jarod42 Sep 20 '17 at 21:06
  • @VTT: would fail with overload methods. – Jarod42 Sep 20 '17 at 21:08
  • 1
    @VTT this question is tagged for C as well as C++. – M.M Sep 20 '17 at 21:08
  • @Jarod42 and others, the trick is (was?) common and based on the fact that *there must not happen any dereferencing* at all. It seems like it is there, but it isn't. Even in this case of mine: if the expression isn't actually ever evaluated, the UB doesn't apply — but is the core of the question, if the evaluation happens or not. – ShinTakezou Sep 20 '17 at 21:11
  • @JonathanLeffler definitely the third fix… though the `offsetof` avoids the show `(struct S *)NULL` which puzzles many. – ShinTakezou Sep 20 '17 at 21:15
  • You even say in your question "The comma operator should evaluate the left operand" but then go on to ask a question "is it guaranteed NOT to", you contradict what you already appeared to know – M.M Sep 20 '17 at 21:15
  • It *can* happen, and you have proven it by using `volatile`. The point is that the compiler might "think" this is some special memory, reading from which is triggering some unknown action (like some hardware register read is sometimes having some side effects), and has to be performed. – Eugene Sh. Sep 20 '17 at 21:16
  • @M.M apparently. If you reason about it better, you reach that the standard could state somewhere something like "the evaluation must be skipped in the following cases where the compiler can assure there aren't side effect: … follow the list…". I haven't digged into them too much, as you can imagine by the question, but even in those few lines read I've found sometimes surprises. – ShinTakezou Sep 20 '17 at 21:19
  • 4
    The program has undefined behaviour. One of the legitimate manifestations of undefined behaviour is not crashing. There is nothing to discuss, really. – n. m. could be an AI Sep 20 '17 at 21:37
  • I'm sorry, the other part of the question, namely, how to make rhe compilatiin fail if requested struct member doesn't exist, is actually well defined and does have an answer. You can use `sizeof` as others suggested, or a not-taken branch of a conditional operator, e.g. `(0?(void)((type*)0)->member:(void)0)`. The expression in the left branch *is* guaranteed not to evaluate. – n. m. could be an AI Sep 20 '17 at 22:17
  • @n.m. The question is also about my ignorance about what standards have to say. There was a chance that the left operand of the comma operator could have been unevaluated (hopely using the right word…) under specific conditions as per standard, so that UB couldn't be triggered. It happened to be not so. – ShinTakezou Sep 20 '17 at 22:30
  • "In the comma operator, is the left operand guaranteed not to be actually executed if it hasn't side effects?" – You don't even have to know anything about C++ (except for the fact that it is Turing-complete) to answer this: figuring out whether the left operand has side-effects is equivalent to solving the Halting Problem. Obviously, the standard cannot force compiler writers to solve the Halting Problem, therefore, such a clause cannot possibly exist in the standard. – Jörg W Mittag Sep 21 '17 at 10:29
  • Philosoraptor questions: how can you tell if a line of code is executed or not, if it has no side effects? And: in your example, the left-hand operand may throw a segfault; isn't that a side effect? – Federico Poloni Sep 21 '17 at 11:40
  • In C you'd rather write `#define STR_MEMBER(S, X) ((struct S){.X = 0}, #X)` which is 100% safe. The proper solution is of course, not to invent such horrible macros in the first place, but rather take action based on type. C has `_Generic` and C++ has templates. I don't think this macro fills any purpose in either language, smells like an "XY problem". – Lundin Sep 21 '17 at 11:42
  • 1
    Instead of using pointers, you can use in-place values in both languages as long as you abstract the in-place part to a helper macro: `#define STR_MEMBER(S,X) (sizeof(VALUE(S).X), #X)` with `VALUE(S)` defined as either `std::declval()` or `(struct S){0}` depending on the state of `__cplusplus`. – Alex Celeste Sep 21 '17 at 12:49
  • @JörgWMittag interesting: it means gcc solved it! — or, that there is at least one case for which compilers can “see” that the only effect of an expression is to read a value… that then is discarded (because it's the left operand of the comma op), hence it can be optimized simply removing it. In this and other cases a standard could mandate a no-op, it is a matter of deciding to write so. – ShinTakezou Sep 21 '17 at 18:29
  • @FedericoPoloni nonsense. The compiler produces code and before that, an intermediate representation which can be “analyzed” to determine many things, among these if a certain “piece” would have no effects if the code would have been generated. Of course, all the physical changes into the cpu, that occurs to execute any piece of code, could be seen as a side effects of that code. But it's not what I mean usually, and hopefully I am not alone. – ShinTakezou Sep 21 '17 at 18:32
  • @Lundin I wrote a sentence to avoid comments like "invent such horrible macros". Can you solve the problem with `_Generic` and templates? I'll see later your answer where, after the ritual "it is UB", there's an explanation of how you'd use `_Generic`(C11 anyway) and templates to achieve what I wanted. The macro fills the purpose explained in the question. I don't know how "XY problems" smell like. – ShinTakezou Sep 21 '17 at 18:37
  • @ShinTakezou My point is, there should be no situation where you need to find out what members a struct have in run-time, since members are determined at compile time. The need for such suggests a muddy design to begin with, hence "XY problem" - what you think you need is not necessarily the best solution. – Lundin Sep 22 '17 at 06:43
  • With _Generic you wouldn't write a macro to see if a type exists, but perhaps to access it in a type safe manner. Given a proper typedef'd struct `typedef struct { int value; } a_t;` you could for example write something like `#define get_value(name) _Generic((name), a_t: (name).value)` and call it like `a_t a; int something = get_value(a)` – Lundin Sep 22 '17 at 06:45
  • Compound literals inside the macro is otherwise the best way to solve the problem in the question. Some answers to [How to create type safe enums?](https://stackoverflow.com/questions/43043246/how-to-create-type-safe-enums) use very similar techniques. – Lundin Sep 22 '17 at 06:46
  • @Lundin "find out what members struct have in run-time". Nope. My intention was some sort-of metaprogramming where everything must be done **at compile time**. Moreover, even the wrong macro works because at compile time gcc "optimizes" it and there exists no code which actually can execute the access at runtime. (By "execute the access" I mean a piece of assembly code which reads from a memory address 0 plus the offset of the member. If I had seen such a code, using `-S`, this question likely wouldn't exist.) – ShinTakezou Sep 23 '17 at 07:51
  • @Lundin `_Generic`… I don't need type safety, but a string (known at compile time) which contains letters which are the symbol of a struct member. My initial thought indeed was to write a minimal parser for `struct` able to generate another `struct` combining other infos, then all put in `.h`—all that to be sure the strings contain no typo… But then I wondered if it could be done by the preprocessor/compiler at compile time. The real usage wasn't like the given example but more like `struct info xxx_info[] = {{STR_MEMBER{xxx,yyy}, /*…*/},/*…*/};` All data are known at compile time. – ShinTakezou Sep 23 '17 at 08:02

6 Answers6

17

Is there anything in the C and C++ languages standard which guarantees that compilers will always avoid actual evaluation of the left operand of the comma operator ?

It's the opposite. The standard guarantees that the left operand IS evaluated (really it does, there aren't any exceptions). The result is discarded.


Note: for lvalue expressions, "evaluate" does not mean "access the stored value". Instead, it means to work out where the designated memory location is. The other code encompassing the lvalue expression may or may not then go on to access the memory location. The process of reading from the memory location is known as "lvalue conversion" in C, or "lvalue to rvalue conversion" in C++.

In C++ a discarded-value expression (such as the left operand of the comma operator) only has lvalue to rvalue conversion performed on it if it is volatile and also meets some other criteria (see C++14 [expr]/11 for detail). In C lvalue conversion does occur for expressions whose result is not used (C11 6.3.2.1/2).

In your example, it is moot whether or not lvalue conversion happens. In both languages X->Y, where X is a pointer, is defined as (*X).Y; in C the act of applying * to a null pointer already causes undefined behaviour (C11 6.5.3/3), and in C++ the . operator is only defined for the case when the left operand actually designates an object (C++14 [expr.ref]/4.2).

M.M
  • 138,810
  • 21
  • 208
  • 365
  • If there isn't anything later stating that nothing is actually evaluated under listed conditions… You are implying that there isn't text stating this, I suppose you've checked, but maybe it would be clearer to specify that there aren't such exceptions. – ShinTakezou Sep 20 '17 at 21:23
  • @ShinTakezou No there is nothing like that. You can read the definition of the comma operator and see that it does not say "the left operand is sometimes not evaluated" or whatever – M.M Sep 20 '17 at 21:28
  • A "definition" could be longer than few lines, it can be split in several paragraphs covering several cases. I suppose you are stating that it's not the case for the comma operator. – ShinTakezou Sep 20 '17 at 21:41
  • Maybe, and it would have taken you less time to add to your answer that there aren't exceptions stated anywhere in the standards. (As per suggestion given in my first comment) – ShinTakezou Sep 20 '17 at 21:43
  • 1
    @ShinTakezou I think "The standard guarantees that the left operand IS evaluated" already clearly implies there are no exceptions. – M.M Sep 20 '17 at 21:46
  • 1
    Note that OP is conflating "evaluated" and "access the value". In C++ in particular, the lvalue-to-rvalue conversion is not applied to a discarded glvalue expression that does not have volatile-qualified type. That said, the UB in this case comes from the `->`, so whether an subsequent attempt to access the stored value is made is irrelevant. – T.C. Sep 20 '17 at 21:52
  • @T.C. Yeah , funnily enough I considered adding that to my answer originally but decided not to complicate things (probably the wrong decision). Thanks for clearly stating the matter – M.M Sep 20 '17 at 21:56
  • @T.C. maybe the C++ case will be slightly more complicated if the "empty lvalue" proposal goes through ; In C++14 I believe we're "saved" by the provision that an lvalue must actually designate storage (since using `*` on a null pointer is not explicitly UB) – M.M Sep 20 '17 at 22:22
  • @T.C. my fault, I used terms without a specific jargon in mind, maybe mixing merely syntax parsing / compile time checks with evaluating = generating the code which will run and cause problems. – ShinTakezou Sep 20 '17 at 22:22
  • @M.M Class member access on something that isn't an object of the right type is currently UB by omission; "empty lvalues" won't change that except perhaps make it explicitly UB. – T.C. Sep 20 '17 at 22:25
12

The comma operator (C documentation, says something very similar) has no such guarantees.

In a comma expression E1, E2, the expression E1 is evaluated, its result is discarded ..., and its side effects are completed before evaluation of the expression E2 begins

irrelevant information omitted

To put it simply, E1 will be evaluated, although the compiler might optimize it away by the as-if rule if it is able to determine that there are no side-effects.

Justin
  • 24,288
  • 12
  • 92
  • 142
3

Gcc/g++ 6.3 and 4.9.2 never produced that dangerous code, even with -O0, as if they were always able to “see” that the evaluation hasn't side effects and so it can be skipped.

clang will produce code which raises an error if you pass it the -fsanitize=undefined option. Which should answer your question: at least one major implementation's developers clearly consider the code as having undefined behaviour. And they are correct.

Suggestions about other possible “fixes” are welcome, too.

I would look for something which is guaranteed not to evaluate the expression. Your suggestion of offsetof does the job, but may occasionally cause code to be rejected that would otherwise be accepted, such as when X is a.b. If you want that to be accepted, my thought would be to use sizeof to force an expression to remain unevaluated.

  • I think I will go for `sizeof`. Unfortuntely when I did my empirical checks I hadn't clang at hand. `-fsanitize=undefined` is accepted by gcc 6.3 too, but it seems everything is fine… clang 3.0-6.2 accepts it too, but the same result, except for warnings `expression result unused`. Indeed I'm testing a different code where the macro is used only to populate a struct. – ShinTakezou Sep 20 '17 at 21:36
3

You ask,

is there anything in the C and C++ languages standard which guarantees that compilers will always avoid actual evaluation of the left operand of the comma operator when the compiler can be sure that the evaluation hasn't side effects?

As others have remarked, the answer is "no". On the contrary, the standards both unconditionally state that the left-hand operand of the comma operator is evaluated, and that the result is discarded.

This is of course a description of the execution model of an abstract machine; implementations are permitted to work differently, so long as the observable behavior is the same as the abstract machine behavior would produce. If indeed evaluation of the left-hand expression produces no side effects, then that would permit skipping it altogether, but there is nothing in either standard that provides for requiring that it be skipped.

As for fixing it, you have various options, some of which apply only to one or the other of the two languages you have named. I tend to like your offsetof() alternative, but others have noted that in C++, there are types to which offsetof cannot be applied. In C, on the other hand, the standard specifically describes its application to structure types, but says nothing about union types. Its behavior on union types, though very likely to be consistent and natural, as technically undefined.

In C only, you could use a compound literal to avoid the undefined behavior in your approach:

#define HAS_MEMBER(T,X) (((T){0}).X, #X)

That works equally well on structure and union types (though you need to provide a full type name for this version, not just a tag). Its behavior is well defined when the given type does have such a member. The expansion violates a language constraint -- thus requiring a diagnostic to be emitted -- when the type does not have such a member, including when it is neither a structure type nor a union type.

You might also use sizeof, as @alain suggested, because although the sizeof expression will be evaluated, its operand will not be evaluated (except, in C, when its operand has variably-modified type, which will not apply to your use). I think this variation will work in both C and C++ without introducing any undefined behavior:

#define HAS_MEMBER(T,X) (sizeof(((T *)NULL)->X), #X)

I have again written it so that it works for both structs and unions.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
2

The left operand of the comma operator is a discarded-value expression

5 Expressions
11 In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value expression. The expression is evaluated and its value is discarded. [...]

There are also unevaluated operands which, as the name implies, are not evaluated.

8 In some contexts, unevaluated operands appear (5.2.8, 5.3.3, 5.3.7, 7.1.6.2). An unevaluated operand is not evaluated. An unevaluated operand is considered a full-expression. [...]

Using a discarded-value expression in your use case is undefined behavior, but using an unevaluated operand is not.

Using sizeof for example would not cause UB because it takes an unevaluated operand.

#define STR_MEMBER(S,X) (sizeof(S::X), #X)

sizeof is preferable to offsetof, because offsetof can't be used for static members and classes that are not standard-layout:

18 Language support library
4 The macro offsetof(type, member-designator) accepts a restricted set of type arguments in this International Standard. If type is not a standard-layout class (Clause 9), the results are undefined. [...] The result of applying the offsetof macro to a field that is a static data member or a function member is undefined. [...]

alain
  • 11,939
  • 2
  • 31
  • 51
  • `offsetof` should appear somewhere at least in the C standard, if I remember well. Is it discarded or unevaluated like `sizeof`? – ShinTakezou Sep 20 '17 at 21:21
  • 1
    I looked at the C++ draft N4296, `offsetof` is a macro. I didn't find much more about it at first glance. But `sizeof` is described as having an unevaluated operand. – alain Sep 20 '17 at 21:24
  • gcc defines it as a built-in, I've tried using it with `volatile`, it seems fine (no segfault), but I can't find an assurance about this behaviour. – ShinTakezou Sep 20 '17 at 21:26
  • 1
    @ShinTakezou Also, the first operand of `offsetof` must be a standard-layout class, else the behavior is undefined. `sizeof` doesn't have this limitation. (In C++) – alain Sep 20 '17 at 21:27
  • good point for `sizeof` against `offsetof`. The struct was a POD but you never know, maybe it will change! – ShinTakezou Sep 20 '17 at 21:28
2

The language doesn't need to say anything about "actual execution" because of the as-if rule. After all, with no side effects how could you tell whether the expression is evaluated? (Looking at the assembly or setting breakpoints doesn't count; that's not part of execution of the program, which is all the language describes.)

On the other hand, dereferencing a null pointer is undefined behavior, so the language says nothing at all about what happens. You can't expect as-if to save you: as-if is a relaxation of otherwise-plausible restrictions on the implementation, and undefined behavior is a relaxation of all restrictions on the implementation. There is therefore no "conflict" between "this doesn't have side effects, so we can ignore it" and "this is undefined behavior, so nasal demons"; they're on the same side!

Davis Herring
  • 36,443
  • 4
  • 48
  • 76
  • I would rather say that the first paragraph would be a reason to suggest adding a mandatory “elimination” of the no side effects - discarded value case. / About the 2nd par, it isn't clear how they are on the same side… Anyway, few days ago I was joking with a friend about the fact that containers and similar technologies change the “it works on my machine” to “I can guarantee it works on all these machines (like mine)”. It could push an interesting paradigm shift and remove that annoying cliche on “nasal demons” and the already mentioned nice one “it works on my machine”! – ShinTakezou Sep 21 '17 at 18:52
  • You can't make eliminating _all_ side-effect-free evaluations mandatory, since that's undecidable. Exactly how hard, then, should the implementation be required to try to prove something eliminable? / Those considerations are on the same side because they both allow the implementation to do things that are not the result of a simpleminded reading of the source. (And "it works on my machine" really just means "I haven't found the case that fails yet". Containers or no, it's no substitute for correctness.) – Davis Herring Sep 21 '17 at 19:29
  • The compiler has something like `(discard (read 0 16))`, in an on-the-fly invented representation of a piece of the result of parsing (plus whatever) `(((p*)0)->x, "x")`. The special rule for the comma op could consider a finite number of well defined cases, e.g. literals, const expressions and “read only” operations (w/o `volatile`). These few cases can be handled. Anyway, no point in discussing it further here./ E.g. consider this very case: gcc vN.N…doesn't emit code to access `0->X`. Hence it works on my machine with that compiler, thus…The case that fails doesn't exist at all here. – ShinTakezou Sep 21 '17 at 19:50