2

offsetof is defined like this in stddef.h:

#define offsetof(type, member) ((size_t)&((type *)0)->member)

Does this invoke undefined behavior due to the dereference of a NULL pointer? If not, why?

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
  • 3
    there is no dereference of a NULL pointer – user3700562 Aug 03 '19 at 21:15
  • @user3700562 `((type *)0)->member`? `NULL` is a symbolic constant equal to `0`. – S.S. Anne Aug 03 '19 at 21:16
  • 5
    It has `&` in front of it. That's important. – KamilCuk Aug 03 '19 at 21:18
  • 2
    The contents of `` are part of a C implementation, not part of a program. Asking whether it is undefined is like asking whether some assembly language that happens to be part of the source of a compiler has behavior not defined by the C standard—of course it has undefined behavior, because it is not covered by the standard. Normally `` is designed in conjunction with a compiler. Unless your situation is you are trying to implement your own `` using a compiler that you do not control and can only rely on for what the standard specifies, the question is misplaced. – Eric Postpischil Aug 03 '19 at 21:28
  • @JL2210: My comment stands. – Eric Postpischil Aug 03 '19 at 21:30
  • @EricPostpischil Please see the latest update. This should address your concerns. – S.S. Anne Aug 03 '19 at 21:32
  • 3
    Related: http://c-faq.com/struct/offsetof.html – melpomene Aug 03 '19 at 21:35
  • @user3386109 That's asking why it works. I'm asking if it's undefined behavior (and if not, why). – S.S. Anne Aug 03 '19 at 21:46
  • @JL2210 Read the code in the question carefully. It shows you how to do the job right. And the answers also address your question. – user3386109 Aug 03 '19 at 21:48
  • @user3386109 A problem with that linked question is that [In ANSI C, `offsetof` is defined as below.](https://stackoverflow.com/q/713963/2410359) is incorrect. [@Keith Thompson](https://stackoverflow.com/questions/713963/why-does-this-implementation-of-offsetof-work#comment37385079_713963) details why. – chux - Reinstate Monica Aug 03 '19 at 21:58
  • I was just trying to help the OP find a solution that actually has a chance to work, but since the OP seems not interested in getting it right, and since chux seems to think there's no way to get it right, then I guess we're done here. – user3386109 Aug 03 '19 at 22:09
  • @user3386109 I just want to know if the way I've been doing it is right. If not, I can ask another question. Thank you for wanting to help. – S.S. Anne Aug 03 '19 at 22:12
  • @user3386109 I do want to do the job right, I just want to do it correctly and with portability if possible. Take zwol's answer, for example. – S.S. Anne Aug 03 '19 at 22:36
  • 1
    @JL2210 The `#define offsetof(type, f) ((size_t) \ ((char *)&((type *)0)->f - (char *)(type *)0))` version (with the subtraction, lest, god forbid, null pointer constants aren't all bits zero) should be *very* portable. You'd need an extra in-your-face smart compiler to mess it up. – Petr Skocik Aug 03 '19 at 22:47
  • @PSkocik Would it be more portable if I used `NULL` instead of `0`? – S.S. Anne Aug 03 '19 at 23:06
  • @JL2210 No. See https://stackoverflow.com/a/55520832/1084774 – Petr Skocik Aug 03 '19 at 23:08
  • @PSkocik It really wouldn't take much of an "extra in-your-face smart compiler"; all it would need to notice is that `((T*)0)->f` has undefined behavior. "Control flow paths that dereference a null pointer are impossible and can be deleted" is standard in current-generation compilers. – zwol Aug 04 '19 at 13:28

2 Answers2

6

In normal C code, the behavior of ((size_t)&((type *)0)->member) is not specified by the C standard:

  • First, per C 2018 6.5.2.3 4, about ->, ((type *)0)->member designates the lvalue of the member member of the structure to which (type *)0 points. But ((type *)0) does not point to a structure, and therefore there is no member this can be the lvalue of.
  • Supposing it does give an lvalue for some hypothetical structure, there is no guarantee that taking its address and converting it to size_t yields the offset of the member, both because we do not know that (type *)0 yields an address that is actually represented with zero in the implementation’s addressing scheme and because the conversion of a pointer to an integer specified by C 2018 6.3.2.3 6 only tells us the result is implementation-defined, not that it yields the address in any otherwise meaningful form.

Were this code in a standard header, such as <stddef.h>, it is under the control of the C implementation and not the C standard, and so questions about whether it is undefined according to the C standard do not apply. The C standard only says how the standard headers behave when included—an implementation may use any means it chooses to achieve the required effects, whether that is simply defining the behavior of source code that is not fully defined by the C standard or putting source code in an entirely different language in the headers. (In fact, the file stddef.h could be entirely empty or not exist at all, and the compiler could supply its required declarations when it sees #include <stddef.h> without reading any actual file from disk.)

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • When in ``, " it is under the control of the C implementation" so no UB - yes. – chux - Reinstate Monica Aug 03 '19 at 21:50
  • Nitpick: No pointer-to-integer conversion occurs in this code fragment. `(type *)0` produces the NULL pointer with type `type`. Applying `->` to a null pointer is a dereference and it has undefined behavior. The special cases in 6.5.3.2 for `&*ptr` and `&ptr[i]` do not apply, so the presence of `&` is irrelevant. – zwol Aug 03 '19 at 22:20
  • @zwol Interestingly http://port70.net/~nsz/c/c11/n1570.html#6.6p9 mentions that -> with & may be used in the creation of an **address constant** provided no object is accessed. That would imply -> in this context is perhaps not really a dereference (=object access?). – Petr Skocik Aug 03 '19 at 22:27
  • @PSkocik No, if the committee had meant `->` not to be a dereference when it's underneath `&` (in the expression tree), they would have said so in 6.5.3.2, as they did for `&*ptr` and `&ptr[i]`. (There is a colorable argument that this is an oversight and someone should file a DR, though.) – zwol Aug 03 '19 at 22:29
  • 2
    @zwol Given that `member` is of type `int`, and located at offset `x` in the structure, Is not `&ptr->member` equivalent to `&*(int *)((char *)ptr + x)`? – user3386109 Aug 03 '19 at 22:33
  • @user3386109 Provided `ptr` refers to a real instance of the type, yes, I believe so, but I'm not prepared to quote chapter and verse to prove it either way. – zwol Aug 04 '19 at 13:30
  • @zwol: The authors of the Standard generally tried to avoid wasting ink describing all of the transitive implications of its definitions, in cases where they had no reason to expect that they wouldn't be treated transitively. Ironically, the Standard specifies that `x[y]` is equivalent to `(*(x+(y)))` even when the left operand is an array, despite the fact that it creates problems in contexts where the left operand isn't an lvalue and the expression as a whole shouldn't be either. Note that `(x)->y` is equivalent to `(*(x)).y` but the equivalence may not hold without the parentheses. – supercat Aug 04 '19 at 18:59
  • @supercat You are mistaken. The standard is interpreted utterly literally; if it doesn't say X, then the absence means not-X, even if it would make more sense in context for X to have been intended. As usual I am not interested in discussing whether this is the "correct" way to understand standardese; I am only stating facts. – zwol Aug 05 '19 at 12:03
  • @zwol: The way the parts of the Standard are written would render parts of the language nonsensical. The authors of clang and gcc generally make presumptions that in such cases the authors of the Standard meant for those parts to be *almost* totally useless (just useful enough to justify their existence) rather than totally useless, but I see no reason to believe such an interpretation is remotely consistent with the authors' intentions. – supercat Aug 05 '19 at 15:11
  • @supercat Whether or not any particular interpretation is consistent with the authors' intentions falls into the class of things I am not interested in discussing with you. – zwol Aug 05 '19 at 16:19
-3

Leaving aside all the other reasons why it might not be a correct implementation of offsetof,

#define offsetof(type, member) ((size_t)&((type *)0)->member)

is not appropriate even as part of the implementation because everything in stddef.h must work correctly in both C and C++, and in C++, the above construct definitely will misbehave in the presence of overloaded operator->. This is why GCC's stddef.h switched to using a special intrinsic function called __builtin_offsetof upwards of fifteen years ago.

Yes, I am saying that if you saw this in some stddef.h, that stddef.h is buggy.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • Well, to be specific, I saw this: `#if __GNUC__ > 3 #define offsetof(type, member) __builtin_offsetof(type, member) #else #define offsetof(type, member) ((size_t)&((type *)0)->member) #endif` (quoted wrong version) – S.S. Anne Aug 03 '19 at 22:21
  • Still buggy. That construct is a little different but it has all the problems of the one you originally quoted. This simply can't be done in C++ without a dedicated compiler intrinsic. Whatever implementation this is should have an `#error` in its fallback case. – zwol Aug 03 '19 at 22:23
  • 2
    There is no compulsion that the C++ compiler be able to use a header provided by a C compiler; the only requirement is that the `` provided by the C++ compiler must be acceptable to the C++ compiler (and the `` provided by the C compiler must be acceptable to the C compiler, of course). – Jonathan Leffler Aug 03 '19 at 22:40
  • 4
    Since when did a language-lawyer question defer to reality or C++, @JL2210? Especially a 'C language only' language-lawyer question. – Jonathan Leffler Aug 03 '19 at 23:16
  • 2
    @JL2210: Even were it true that, in practice, a paired C implementation and C++ implementation used a common file named stddef.h to implement ``, surely there are C implementations that are not paired with C++ implementations, and hence the assertion in this answer that code in stddef.h must work correctly in both C and C++ is false. – Eric Postpischil Aug 04 '19 at 00:08
  • @JonathanLeffler The question is tagged language-lawyer, but JL2210 appeared to be wanting _practical_ reasons why this is not a good definition of `offsetof`, rather than reasons based on the fine print of the standard, so I gave one. I would add some hedging to the text of my answer if this hadn't already been closed as a duplicate. I may instead write a polemic about how the accepted answer to the duplicate is fractally wrong. ;-) – zwol Aug 04 '19 at 13:25
  • @zwol How's this? : `# define offsetof(type, member) ({ \ type __tmp_offsetof__; \ (char *)&__tmp_offsetof__.member - (char *)&__tmp_offsetof__; \ })` – S.S. Anne Aug 04 '19 at 17:28
  • @JL2210 I don't see anything obviously wrong with that (in C - again, in C++, overloaded operators break it), but if you're going to use `({ ... })` you might as well use `__builtin_offsetof`. – zwol Aug 05 '19 at 12:04