39

In ANSI C, offsetof is defined as below.

#define offsetof(st, m) \
    ((size_t) ( (char *)&((st *)(0))->m - (char *)0 ))

Why won't this throw a segmentation fault since we are dereferencing a NULL pointer? Or is this some sort of compiler hack where it sees that only address of the offset is taken out, so it statically calculates the address without actually dereferencing it? Also is this code portable?

M.M
  • 138,810
  • 21
  • 208
  • 365
chappar
  • 7,275
  • 12
  • 44
  • 57
  • 2
    Is this the first question I've seen on SO complaining about code that works? :-) – paxdiablo Apr 03 '09 at 14:33
  • 5
    There was that guy with if(0){asm(nop)} where leaving it out made something fail... – RBerteig Apr 03 '09 at 20:45
  • 5
    ANSI C (actually ISO C) does not specify this definition for `offsetof`. It merely specifies how it must behave. The actual definition is up to each implementation, and can vary one one implementation to another. – Keith Thompson Jun 13 '14 at 18:40
  • Important to note that MISRA-C:2004 compliant code requires that `offsetof` is not used because it can easily lead to undefined behaviour. – DimP Dec 15 '20 at 11:17

8 Answers8

40

At no point in the above code is anything dereferenced. A dereference occurs when the * or -> is used on an address value to find referenced value. The only use of * above is in a type declaration for the purpose of casting.

The -> operator is used above but it's not used to access the value. Instead it's used to grab the address of the value. Here is a non-macro code sample that should make it a bit clearer

SomeType *pSomeType = GetTheValue();
int* pMember = &(pSomeType->SomeIntMember);

The second line does not actually cause a dereference (implementation dependent). It simply returns the address of SomeIntMember within the pSomeType value.

What you see is a lot of casting between arbitrary types and char pointers. The reason for char is that it's one of the only type (perhaps the only) type in the C89 standard which has an explicit size. The size is 1. By ensuring the size is one, the above code can do the evil magic of calculating the true offset of the value.

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
  • I don't have a C standard available, but I thought I remembered something in C90 about not necessarily being able to use (not only dereference) arbitrary addresses. The rationale was machines like the 8086 and IBM 370 that used segment registers, and couldn't refer to their entire address space. – David Thornley Apr 03 '09 at 13:56
  • 3
    In the C Standard, the `->` in `&(pSomeType->SomeIntMember)` does cause a dereference. Perhaps you could clarify what you meant when you claim that it doesn't. – M.M Jul 30 '17 at 08:26
  • 1
    This answer is fractally wrong: not only is it wrong overall, but I see at least one error in almost every single sentence. – zwol Aug 03 '19 at 22:27
11

Although that is a typical implementation of offsetof, it is not mandated by the standard, which just says:

The following types and macros are defined in the standard header <stddef.h> [...]

offsetof(type,member-designator)

which expands to an integer constant expression that has type size_t, the value of which is the offset in bytes, to the structure member (designated by member-designator), from the beginning of its structure (designated by type). The type and member designator shall be such that given

statictypet;

then the expression &(t.member-designator) evaluates to an address constant. (If the specified member is a bit-field, the behavior is undefined.)

Read P J Plauger's "The Standard C Library" for a discussion of it and the other items in <stddef.h> which are all border-line features that could (should?) be in the language proper, and which might require special compiler support.

It's of historic interest only, but I used an early ANSI C compiler on 386/IX (see, I told you of historic interest, circa 1990) that crashed on that version of offsetof but worked when I revised it to:

#define offsetof(st, m) ((size_t)((char *)&((st *)(1024))->m - (char *)1024))

That was a compiler bug of sorts, not least because the header was distributed with the compiler and didn't work.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • *"... `` which are all border-line features that could (should?) be in the language proper"* - I'd say they *are* part of the language proper because even a freestanding implementation is required to always support them... – Antti Haapala -- Слава Україні Oct 17 '18 at 14:47
  • Why only `static type t` and not simply `type t`? – explogx May 20 '20 at 17:52
  • @eigenslacker — mainly I copied the standard and that’s what it says. It probably has some deep significance, maybe related to VLA (variable length array — and variable modified types) which can’t be used with `static`. It may be to do with incomplete types — ditto. – Jonathan Leffler May 20 '20 at 17:56
10

In ANSI C, offsetof is NOT defined like that. One of the reasons it's not defined like that is that some environments will indeed throw null pointer exceptions, or crash in other ways. Hence, ANSI C leaves the implementation of offsetof( ) open to compiler builders.

The code shown above is typical for compilers/environments that do not actively check for NULL pointers, but fail only when bytes are read from a NULL pointer.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • 1
    Just to be clear, the `offsetof()` macro has very commonly and widely been implemented as shown in the question, or even more simply without the subtraction, on the vast majority of platforms where pointers are effectively integers. Most C compilers do not actively check for NULL pointers. The expression used does **NOT** dereference *anything* --- it simply calculates the offset by using an address (which happens to be zero) with a simple arithmetic addition of the internally known offset of the member. When optimized there is not even any run-time addition performed. – Greg A. Woods Aug 11 '17 at 03:48
9

To answer the last part of the question, the code is not portable.

The result of subtracting two pointers is defined and portable only if the two pointers point to objects in the same array or point to one past the last object of the array (7.6.2 Additive Operators, H&S Fifth Edition)

sigjuice
  • 28,661
  • 12
  • 68
  • 93
3

Listing 1: A representative set of offsetof() macro definitions

// Keil 8051 compiler
#define offsetof(s,m) (size_t)&(((s *)0)->m)

// Microsoft x86 compiler (version 7)
#define offsetof(s,m) (size_t)(unsigned long)&(((s *)0)->m)

// Diab Coldfire compiler
#define offsetof(s,memb) ((size_t)((char *)&((s *)0)->memb-(char *)0))

typedef struct 
{
    int     i;
    float   f;
    char    c;
} SFOO;

int main(void)
{
  printf("Offset of 'f' is %zu\n", offsetof(SFOO, f));
}

The various operators within the macro are evaluated in an order such that the following steps are performed:

  1. ((s *)0) takes the integer zero and casts it as a pointer to s.
  2. ((s *)0)->m dereferences that pointer to point to structure member m.
  3. &(((s *)0)->m) computes the address of m.
  4. (size_t)&(((s *)0)->m) casts the result to an appropriate data type.

By definition, the structure itself resides at address 0. It follows that the address of the field pointed to (Step 3 above) must be the offset, in bytes, from the start of the structure.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Viswesn
  • 4,674
  • 2
  • 28
  • 45
2

It doesn't segfault because you're not dereferencing it. The pointer address is being used as a number that's subtracted from another number, not used to address memory operations.

chaos
  • 122,029
  • 33
  • 303
  • 309
2

It calculates the offset of the member m relative to the start address of the representation of an object of type st.

((st *)(0)) refers to a NULL pointer of type st *. &((st *)(0))->m refers to the address of member m in this object. Since the start address of this object is 0 (NULL), the address of member m is exactly the offset.

char * conversion and the difference calculates the offset in bytes. According to pointer operations, when you make a difference between two pointers of type T *, the result is the number of objects of type T represented between the two addresses contained by the operands.

Sean Bright
  • 118,630
  • 17
  • 138
  • 146
Cătălin Pitiș
  • 14,123
  • 2
  • 39
  • 62
  • Sean, Why that subtraction was needed? can't we just return (char *)&((st *)(0))->m ? – chappar Apr 03 '09 at 14:03
  • There are C implementations for which a null pointer is not represented by the value 0 internally. On such an implementation, I suppose that either this C code will completely fail because the compiler won't know how to handle the null pointer in pointer arithmetic, or it may work thanks to the subtraction (because the representation of the null pointer needs to be cancelled). – vinc17 Jul 23 '14 at 22:17
0

Quoting the C standard for the offsetof macro:

C standard, section 6.6, paragraph 9

An address constant is a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a function designator; it shall be created explicitly using the unary & operator or an integer constant cast to pointer type, or implicitly by the use of an expression of array or function type. The array-subscript [] and member-access . and -> operators, the address & and indirection * unary operators, and pointer casts may be used in the creation of an address constant, but the value of an object shall not be accessed by use of these operators.

The macro is defined as

#define offsetof(type, member)  ((size_t)&((type *)0)->member)

and the expression comprises the creation of an address constant.

Although genuinely speaking, the result is not an address constant because it does not point to an object of static storage duration. But this is still agreed upon that the value of an object shall not be accessed, so the integer constant cast to pointer type will not be dereferenced.

Also, consider this quote from the C standard:

C standard, section 7.19, paragraph 3

The type and member designator shall be such that given

static type t;

then the expression &(t.member-designator) evaluates to an address constant. (If the specified member is a bit-field, the behavior is undefined.)

A struct in C is a composite data type (or record) declaration that defines a physically grouped list of variables under one name in a block of memory, allowing the different variables to be accessed via a single pointer or by the struct declared name which returns the same address.

From the compiler perspective, the struct declared name is an address and the member designator is an offset from that address.

explogx
  • 1,159
  • 13
  • 28