13

This is a nitpicky-details question with three parts. The context is that I wish to persuade some folks that it is safe to use <stddef.h>'s definition of offsetof unconditionally rather than (under some circumstances) rolling their own. The program in question is written entirely in plain old C, so please ignore C++ entirely when answering.

Part 1: When used in the same manner as the standard offsetof, does the expansion of this macro provoke undefined behavior per C89, why or why not, and is it different in C99?

#define offset_of(tp, member) (((char*) &((tp*)0)->member) - (char*)0)

Note: All implementations of interest to the people whose program this is supersede the standard's rule that pointers may only be subtracted from each other when they point into the same array, by defining all pointers, regardless of type or value, to point into a single global address space. Therefore, please do not rely on that rule when arguing that this macro's expansion provokes undefined behavior.

Part 2: To the best of your knowledge, has there ever been a released, production C implementation that, when fed the expansion of the above macro, would (under some circumstances) behave differently than it would have if its offsetof macro had been used instead?

Part 3: To the best of your knowledge, what is the most recently released production C implementation that either did not provide stddef.h or did not provide a working definition of offsetof in that header? Did that implementation claim conformance with any version of the C standard?

For parts 2 and 3, please answer only if you can name a specific implementation and give the date it was released. Answers that state general characteristics of implementations that may qualify are not useful to me.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • 9
    It is not undefined behavior when the compiler vendor provided you with the header file. They know what defined undefined behavior looks like on their product. – Hans Passant Jul 14 '11 at 21:53
  • 2
    There is no such thing as "any other reason than..." when talking about undefined behavior. Either you care about the spec or you don't. Even if every C compiler since the beginning of time works fine for some undefined construct, tomorrow's more heavily-optimizing compilers might not. (Because the compiler may _assume_ you do not invoke undefined behavior and make inferences from there. Smarter compilers = more inferences) – Nemo Jul 14 '11 at 22:33
  • I do know how this works, and I still insist on the "any other reason than..." part, because *that* rule is nigh-universally superseded by implementations defining that all pointers, regardless of what language objects they point to, and including the null pointer, are comparable within a global flat address space. – zwol Jul 14 '11 at 22:40
  • 2
    You are 100% sure that no system in the future will use a segmented memory architecture ever again? And you are 100% sure that no compiler writer will find a way to use the assumption "pointers are only comparable within an object" for anything whatsoever, ever? Your crystal ball must be excellent... Regardless, as R. points out, your macro also invokes undefined behavior by applying `->` to a null pointer. (Also, you generally seem more concerned about compilers in the past than in the future, when there are infinitely many more of the latter.) – Nemo Jul 14 '11 at 23:15
  • @Nemo: Yes, to the people I am talking to, ensuring that the program continues to be buildable in all the environments where it has been buildable in the past is enormously more important than preventing a problem that *might* occur in some hypothetical environment in the future. – zwol Jul 15 '11 at 02:07
  • @Zack, I understand that you need to convince people with perverse priorities, but please do understand that they have their priorities backwards. Any hypothetical past system with such major conformance issues has equal amounts of bitrot in the area of **security** and is therefore **unusable in any real-world deployment**. Would these same people care if you broke support for running your database server on Win95? Probably not because they would realize it is **not a viable target** for reasons of fatal unpatched security flaws. – R.. GitHub STOP HELPING ICE Jul 15 '11 at 02:40
  • 1
    @Nemo: I'm quite confident nobody will ever use segmented memory again, but the real issue is that people running high-security systems and willing to sacrifice 90% or even 99% performance for intense security may be willing to use C implementations with radically different pointer models where it's impossible, by performing arithmetic on a pointer to one object, to accidentally obtain a pointer into a different object. Such implementations would not admit the hackish macro definition for `offsetof`. – R.. GitHub STOP HELPING ICE Jul 15 '11 at 02:42
  • @R: Yeah, I was actually thinking of "safe" implementations myself; not necessarily for security, but simply for error checking. @Zack: Call it hypothetical, but in my experience 90% of the work on any piece of code happens after it ships for the first time. Standards help mitigate that. Your experience may differ. – Nemo Jul 15 '11 at 03:28

4 Answers4

11

There is no way to write a portable offsetof macro. You must use the one provided by stddef.h.

Regarding your specific questions:

  1. The macro invokes undefined behavior. You cannot subtract pointers except when they point into the same array.
  2. The big difference in practical behavior is that the macro is not an integer constant expression, so it can't safely be used for static initializers, bitfield widths, etc. Also strict bounds-checking-type C implementations might completely break it.
  3. There has never been any C standard that lacked stddef.h and offsetof. Pre-ANSI compilers might lack it, but they have much more fundamental problems that make them unusable for modern code (e.g. lack of void * and const).

Moreover, even if some theoretical compiler did lack stddef.h, you could just provide a drop-in replacement, just like the way people drop in stdint.h for use with MSVC...

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Yes, the "same array" point causes UD; my original answer was wrong. Paragraph 6.5.6 item 9 in the C1X draft. +1. – Fred Foo Jul 14 '11 at 21:59
  • I need *specific names and release dates of production C implementations* for parts 2 and 3, and *specific references to C standard text, with discussion of the difference between C89 and C99, if any* for part 1. This answer will not convince the people I am trying to convince. – zwol Jul 14 '11 at 22:01
  • BTW I suspect many implementations might define all objects as living inside one array of type `char[SIZE_MAX+1]` also known as "virtual address space". :-) – R.. GitHub STOP HELPING ICE Jul 14 '11 at 22:02
  • For exactly that reason, as I just edited into the question, please give causes of undefined behavior OTHER than the same-array rule. – zwol Jul 14 '11 at 22:04
  • 1
    Part 3: C89 4.1.5 defines `offsetof`; see http://noserose.net/e/C89/ansi.c.txt There is no earlier C standard. – R.. GitHub STOP HELPING ICE Jul 14 '11 at 22:05
  • 1
    For part 2 and UB, see C99 6.5.2.3 paragraph 4. The lefthand operand of `->` must point to an object of struct or union type; however `(tp*)0` does not point to any object. – R.. GitHub STOP HELPING ICE Jul 14 '11 at 22:09
  • And for the part about not being a constant expression, see C99 6.6 especially paragraph 6. C99+TC3 can be found in HTML form at http://port70.net/~nsz/c/c99/n1256.html – R.. GitHub STOP HELPING ICE Jul 14 '11 at 22:11
  • @R: Great answer. I suggest updating it with the reference about `->` on NULL being undefined behavior. – Nemo Jul 14 '11 at 22:34
  • You are not providing a helpful answer to parts 2 or 3 by talking about the standard. I asked specifically about IMPLEMENTATIONS in those parts. – zwol Jul 14 '11 at 22:41
  • Proving the non-existence of non-conformant pseudo-implementations is very difficult, and I don't know how you'd want to document the non-existence. The next best thing, IMO, is proving that any such candidate implementation would be nonconformant and thus not actually C but rather a "C wannabe". – R.. GitHub STOP HELPING ICE Jul 15 '11 at 02:36
  • I am trying to address a specific, widespread, probably totally incorrect belief that `offsetof` was not present in the `stddef.h` headers provided by otherwise C89-conformant compilers. You continue to miss the point of the question so thoroughly that I wish you hadn't answered at all. – zwol Jul 15 '11 at 02:52
  • 2
    I get the point, but I don't see how I could go about proving it without obtaining every single vendor's historical `stddef.h` and showing them all to you... – R.. GitHub STOP HELPING ICE Jul 15 '11 at 02:56
9

To answer #2: yes, gcc-4* (I'm currently looking at v4.3.4, released 4 Aug 2009, but it should hold true for all gcc-4 releases to date). The following definition is used in their stddef.h:

#define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER)

where __builtin_offsetof is a compiler builtin like sizeof (that is, it's not implemented as a macro or run-time function). Compiling the code:

#include <stddef.h>

struct testcase {
    char array[256];
};

int main (void) {
    char buffer[offsetof(struct testcase, array[0])];
    return 0;
}

would result in an error using the expansion of the macro that you provided ("size of array ‘buffer’ is not an integral constant-expression") but would work when using the macro provided in stddef.h. Builds using gcc-3 used a macro similar to yours. I suppose that the gcc developers had many of the same concerns regarding undefined behavior, etc that have been expressed here, and created the compiler builtin as a safer alternative to attempting to generate the equivalent operation in C code.

Additional information:

Regarding your other questions: I think R's answer and his subsequent comments do a good job of outlining the relevant sections of the standard as far as question #1 is concerned. As for your third question, I have not heard of a modern C compiler that does not have stddef.h. I certainly wouldn't consider any compiler lacking such a basic standard header as "production". Likewise, if their offsetof implementation didn't work, then the compiler still has work to do before it could be considered "production", just like if other things in stddef.h (like NULL) didn't work. A C compiler released prior to C's standardization might not have these things, but the ANSI C standard is over 20 years old so it's extremely unlikely that you'll encounter one of these.

The whole premise to this problems begs a question: If these people are convinced that they can't trust the version of offsetof that the compiler provides, then what can they trust? Do they trust that NULL is defined correctly? Do they trust that long int is no smaller than a regular int? Do they trust that memcpy works like it's supposed to? Do they roll their own versions of the rest of the C standard library functionality? One of the big reasons for having language standards is so that you can trust the compiler to do these things correctly. It seems silly to trust the compiler for everything else except offsetof.

Update: (in response to your comments)

I think my co-workers behave like yours do :-) Some of our older code still has custom macros defining NULL, VOID, and other things like that since "different compilers may implement them differently" (sigh). Some of this code was written back before C was standardized, and many older developers are still in that mindset even though the C standard clearly says otherwise.

Here's one thing you can do to both prove them wrong and make everyone happy at the same time:

#include <stddef.h>

#ifndef offsetof
  #define offsetof(tp, member) (((char*) &((tp*)0)->member) - (char*)0)
#endif

In reality, they'll be using the version provided in stddef.h. The custom version will always be there, however, in case you run into a hypothetical compiler that doesn't define it.

Based on similar conversations that I've had over the years, I think the belief that offsetof isn't part of standard C comes from two places. First, it's a rarely used feature. Developers don't see it very often, so they forget that it even exists. Second, offsetof is not mentioned at all in Kernighan and Ritchie's seminal book "The C Programming Language" (even the most recent edition). The first edition of the book was the unofficial standard before C was standardized, and I often hear people mistakenly referring to that book as THE standard for the language. It's much easier to read than the official standard, so I don't know if I blame them for making it their first point of reference. Regardless of what they believe, however, the standard is clear that offsetof is part of ANSI C (see R's answer for a link).


Here's another way of looking at question #1. The ANSI C standard gives the following definition in section 4.1.5:

     offsetof( type,  member-designator)

which expands to an integral constant expression that has type size_t, the value of which is the offset in bytes, to the structure member (designated by member-designator ), from the beginning of its structure (designated by type ).

Using the offsetof macro does not invoke undefined behavior. In fact, the behavior is all that the standard actually defines. It's up to the compiler writer to define the offsetof macro such that its behavior follows the standard. Whether it's implemented using a macro, a compiler builtin, or something else, ensuring that it behaves as expected requires the implementor to deeply understand the inner workings of the compiler and how it will interpret the code. The compiler may implement it using a macro like the idiomatic version you provided, but only because they know how the compiler will handle the non-standard code.

On the other hand, the macro expansion you provided indeed invokes undefined behavior. Since you don't know enough about the compiler to predict how it will process the code, you can't guarantee that particular implementation of offsetof will always work. Many people define their own version like that and don't run into problems, but that doesn't mean that the code is correct. Even if that's the way that a particular compiler happens to define offsetof, writing that code yourself invokes UB while using the provided offsetof macro does not.

Rolling your own macro for offsetof can't be done without invoking undefined behavior (ANSI C section A.6.2 "Undefined behavior", 27th bullet point). Using stddef.h's version of offsetof will always produce the behavior defined in the standard (assuming a standards-compliant compiler). I would advise against defining a custom version since it can cause portability problems, but if others can't be persuaded then the #ifndef offsetof snippet provided above may be an acceptable compromise.

Community
  • 1
  • 1
bta
  • 43,959
  • 6
  • 69
  • 99
  • I was a GCC developer at the time when that change was made to their stddef.h; I didn't have anything personally to do with it, but I witnessed the discussion, and it was almost entirely to do with C++ (where overloaded operators and the like may render the "idiomatic" expansion completely wrong). There was agreement that the idiom was UB, but nobody bothered pinning down exactly why, and for compatibility's sake, the C front end recognizes the idiom and replaces it with the intrinsic! Which makes gcc4 *not* the example I need for #2, even though it avoids the idiom in its own stddef.h. – zwol Jul 15 '11 at 02:12
  • ... These people are happy to trust the compiler to implement `offsetof` *correctly* ... if it's there at all, which they *don't* believe is always the case. I've had this conversation before; there is a widespread, AFAICT completely false, belief that `offsetof` was only added to `stddef.h` in C99, therefore you can't count on its being present and it's easier to just define it yourself. The point of part #3 is to pin down exactly where that mistaken belief might be coming from. – zwol Jul 15 '11 at 02:17
  • Ah, I see what you mean now. They doubt `offsetof`'s existence, not its correctness. I've spoken to a number of people who think that same way, actually. See the update to my answer for my take on this misconception. – bta Jul 16 '11 at 00:10
  • 1
    Thanks for the revision. I'm accepting this answer, since it seems like I'm not going to get the historical view I was looking for, and you've provided a way to get out of the impasse within the project. – zwol Jul 16 '11 at 18:23
3

(1) The undefined behavior is already there before you do the substraction.

  1. First of all, (tp*)0 is not what you think it is. It is a null pointer, such a beast is not necessarily represented with all-zero bit pattern.
  2. Then the member operator -> is not simply an offset addition. On a CPU with segmented memory this might be a more complicated operation.
  3. Taking the address with a & operation is UB if the expression is not a valid object.

(2) For the point 2., there are certainly still archictures out in the wild (embedded stuff) that use segmented memory. For 3., the point that R makes about integer constant expressions has another drawback: if the code is badly optimized the & operation might be done at runtime and signal an error.

(3) Never heard of such a thing, but this is probably not enough to convice your colleagues.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
2

I believe that nearly every optimizing compiler has broken that macro at multiple points in time. Your coworkers have apparently been lucky enough not to have been hit by it.

What happens is that some junior compiler engineer decides that because the zero page is never mapped on their platform of choice, any time anyone does anything with a pointer to that page, that's undefined behavior and they can safely optimize away the whole expression. At that point, everyone's homebrew offsetof macros break until enough people scream about it, and those of us who were smart enough not to roll our own go happily about our business.

I don't know of any compiler where this is the behavior in the current released version, but I think I've seen it happen at some point with every compiler I've ever worked with.

Stephen Canon
  • 103,815
  • 19
  • 183
  • 269