5

This question is more of an academic one, seeing as there is no valid reason to write your own offsetof macro anymore. Nevertheless, I've seen this home-grown implementation pop-up here and there:

#define offsetof(s, m) ((size_t) &(((s *)0)->m))

Which is, technically speaking, dereferencing a NULL pointer (AFAIKT):

C11(ISO/IEC 9899:201x) §6.3.2.3 Pointers Section 3

An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant

So the above implementation is, according to how I read the standard, the same as writing:

#define offsetof(s, m) ((size_t) &(((s *)NULL)->m))

It does make me wonder that, by changing one tiny detail, the following definition of offsetof would be completely legal, and reliable:

#define offsetof(s, m) (((size_t)&(((s *) 1)->m)) - 1)

Seeing as, instead of 0, 1 is used as a pointer, and I subtract 1 at the end, the result should be the same. I'm no longer using a NULL pointer. As far as I can tell the results are the same.

So basically: is there any reason why using 1 instead of 0 in this offsetof definition might not work? Can it still cause UB in certain cases, and if so: when and how? Basically, what I'm asking here is: Am I missing anything here?

Community
  • 1
  • 1
Elias Van Ootegem
  • 74,482
  • 9
  • 111
  • 149
  • It's not only homegrown, it's in every standard library, and it's implemented the same everywhere. As for your variant with `1` instead of `0`, think about that not all platforms can handle unaligned access equally well (some not at all). Not tht it matters anyway since the compiler will be able evaluate it at compile-time anyway. – Some programmer dude Apr 22 '15 at 08:11
  • @JoachimPileborg: Of that, I'm not entirely sure, gcc for example relies on its internal `__builtin_offsetof`. I've yet to see any compiler that can't handle the first `offsetof` implementation, but this bit: `((s *)NULL)->m` is, from an academic point of view incorrect, whereas `((s *) 1)->m` isn't (I think) – Elias Van Ootegem Apr 22 '15 at 08:14

6 Answers6

3

I believe the behaviour is implementation-defined. In 6.3.2.3 of n1256:

5 An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

user4098326
  • 1,712
  • 4
  • 16
  • 20
3

Both definitions are undefined behavior: in the first definition a null pointer is dereferenced and in your second definition you are dereferencing an invalid pointer (the pointer does not point to a valid object). It is not possible in C to write a portable version of offsetof macro.

Defect Report #44 says:

"In particular, this is why the offsetof macro exists: there was otherwise no portable means to compute such translation-time constants."

(DR#44 is for C89 but nothing has changed in the language in C99 and C11 that would allow a portable implementation.)

ouah
  • 142,963
  • 15
  • 272
  • 331
  • 1
    Ok, so regardless of what assumed address is used in a home-grown `offsetof` macro, you fall in the gray area of `->m` (accessing a member on an invalid object), but since you're only using that expression to get the memory address, most compilers can handle the expression? Is that about right? – Elias Van Ootegem Apr 22 '15 at 08:22
  • 1
    @EliasVanOotegem if a compiler provides the implementation of `offsetof` in C (not as a builtin with some magic), it is acceptable that it decides to make defined what is UB by the Standard. – ouah Apr 22 '15 at 08:26
  • @EliasVanOotegem I wouldn't assume so. Modern compilers perform various form of static analysis and mark UB as unreachable so the compiler can assume that given code will never be called. It means that it can optimize out the branch or even assume function will never be called. – Maciej Piechotka Apr 22 '15 at 08:26
  • @MaciejPiechotka: That's why I said _"most compilers"_ not _"all compilers"_ meaning that I'm not assuming they'll be able to handle the expressions in my question, I was just trying to recap the answer given by oauh – Elias Van Ootegem Apr 22 '15 at 08:29
  • @EliasVanOotegem oauh did not say anything that most compilers will handle it. Many compilers (including gcc and clang) are using dataflow analysis. Many compilers will consider UB nodes as unreachable. Even if they are working in your testcases it might happen that it's just an accident and it will fail if the stars will align just right. – Maciej Piechotka Apr 22 '15 at 08:46
2

One problem is that your created pointer does not point to an object.

6.2.4 Storage durations of objects

  1. The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, 33) and retains its last-stored value throughout its lifetime. 34) If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.

and

J.2 Undefined behaviour
- The value of a pointer to an object whose lifetime has ended is used (6.2.4).

3.19.2 indeterminate value: either an unspecified value or a trap representation

When you convert 1 to a pointer, and the created pointer does not point to an object, the value of the pointer becomes indeterminate. You then use the pointer. Both of those cause undefined behavior.

The conversion of an integer to a pointer is also problematic:

6.3.2.3 Pointers

  1. An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation. 67)
2501
  • 25,460
  • 4
  • 47
  • 87
0

The implementation of offsetof with dereferencing a NULL pointer invokes undefined behavior. In this implementation it is assumed that the hypothetical structure begins at address 0. You may assume it to be 1, and yes it will invoke UB too because you are dereferencing a null pointer, but because an uninitialized pointer is dereferenced.

haccks
  • 104,019
  • 25
  • 176
  • 264
0

Nothing in any version of the C standard would forbid a compiler from doing anything it wanted with any macro that would attempt to achieve the effect without defining a storage location to hold the indicated object. Nonetheless, a form like:

#define offsetof(s, m) ((char*)&((((s)*)0)->m)-(char*)0)

would probably be pretty safe for pre-C99 compilers. Note that it generates an integer by subtracting one char* from another. That is specified to work and yield the a constant value when the pointers access parts of the same valid object, and will in practice work on any compiler which doesn't notice that a null pointer isn't a valid object. By contrast, the effect of casting a pointer to an integer or vice versa will vary on different platforms and there are many platforms where (int)(((char*)&foo)+1) - (int)(char*)&foo may not yield 1.

Note also that the meaning of "Undefined Behavior" has changed recently. It used to be that Undefined Behavior meant that the specification didn't say what compilers had to do, but most compilers would generally choose (sometimes arbitrarily) behavior that was mathematically correct or would make sense on the underlying platform. For example, on a 32-bit processor, int32_t foo=2147483647; foo+=(unsigned char)x; if (foo > 100) ... a compiler might determine that for any possible value of x the mathematically-correct value assigned to foo would be in the range 2147483647 to 2147483903, and thus greater than 100 in any case. Or it might perform the operation using two's-complement arithmetic and perform the comparison on a possibly-wrapped-around value. Newer compilers, however, may do something even more interesting.

A new compiler may look at an expression like the example with foo and infer that if x is zero then foo must remain 2147483647, and if x is non-zero the compiler would be allowed to do whatever it likes, so it may infer that as a consequence that the LSB of x must equal zero when the statement is executed, so if the code is preceded by a test for (unsigned char)x==0, that expression would always be true. Given code like the offsetof macro, which would generate Undefined Behavior regardless of the values of any variables, a compiler would be entitled to eliminate not just any code using it, but also any preceding code which could not by any defined means cause program execution to terminate.

Note that casting a non-zero integer literal to a pointer only Undefined Behavior if there does not exist any object whose address has been taken and cast to an integer so as yield that same value. Thus, a compiler would not be able to recognize a variant of the pointer-difference-based offsetof macro which cast some non-zero value to a pointer as exhibiting Undefined Behavior unless it could determine that the number in question did not correspond to any pointer. On the other hand, an attempt to cast a non-zero integer to a pointer would on some systems perform a validation check to ensure that the pointer is valid; such a system may then trap if it isn't.

supercat
  • 77,689
  • 9
  • 166
  • 211
-2

You're not actually dereferencing the pointer, what you're doing is more akin to pointer addition, so using zero should be fine.

Zebra North
  • 11,412
  • 7
  • 37
  • 49
  • Adding an integer to a null pointer invokes Undefined Behavior which compilers should trap at runtime more aggressively than they generally do; while it's helpful for compilers to allow this style of offset calculation at compile time, nothing in the Standard justifies it. – supercat Aug 26 '15 at 18:21