4

The container_of and its WinApi equivalent CONTAINING_RECORD are popular and useful macros. In principle, they use pointer arithmetic over char* to recover a pointer to an aggregate to which a given pointer to the member belongs.

The minimalistic implementation is usually:

#define container_of(ptr, type, member) \
   (type*)((char*)(ptr) - offsetof(type, member))

However, the strict compliance of a usage pattern of this macro is debatable. For example:

struct S {
    int a;
    int b;
};

int foo(void) {
    struct S s = { .a = 42 };
    int *p = &s.b;
    struct S *q = container_of(p, struct S, b);
    return q->a;
}

To my understanding, the program is not strictly compliant because:

  • expression s.b is an l-value of type int
  • &s.b is a pointer. Its value may carry implementation defined attributes like a size of a value it is pointing to
  • (char*)&s.b does not do anything special to the potential metadata bound to the value of the pointer
  • (char*)&s.b - offsetof(struct S, b), here UB is invoked because of pointer arithmetic outside of the value that the pointer is pointing to

I've noticed that the problem is not the container_of macro itself. It is rather the way how ptr argument is constructed. If the pointer was computed from the l-value of struct S type then there would be no out-of-bounds arithmetic. There would be no UB. A potentially compliant version of the program would be:

int foo(void) {
    struct S s = { .a = 42 };
    int *p = (int*)((char*)&s + offsetof(struct S, b));
    struct S *q = container_of(p, struct S, b);
    return q->a;
}

The actual arithmetic taking place is:

container_of(ptr, struct S, b)

Expand container_of

(struct S*)((char*)(ptr) - offsetof(struct S, b))

Place expression for ptr

(struct S*)((char*)((int*)((char*)&s + offsetof(struct S, b))) - offsetof(struct S, b))

Drop casts (char*)(int*)

(struct S*)((char*)&s + offsetof(struct S, b) - offsetof(struct S, b)))

Adding offsetof(struct S,b) does not overflow struct S. There is no UB when doing arithmetics. The positive and negative terms are reduced.

(struct S*)((char*)&s)

Now drop redundant casts.

&s

The question.

Is the above derivation correct?

Is such a usage of container_of strictly compliant?

If so, then the computation of a pointer to the member could be delegated to a new macro named member_of. The pointer can be constructed in a similar fashion as container_of. This new macro would be a complement of container_of to be used in strictly compliant programs.

#define member_of(ptr, type, member) \
   (void*)((char*)(ptr) + offsetof(type, member))

or a bit more convenient and typesafe but less portable (though fine in C23) version:

#define member_of(ptr, member) \
   (typeof(&(ptr)->member))((char*)(ptr) + offsetof(typeof(*(ptr)), member))

The program would be:

int foo(void) {
    struct S s = { .a = 42 };
    int *p = member_of(&s, struct S, b);
    struct S *q = container_of(p, struct S, b);
    return q->a;
}
tstanisl
  • 13,520
  • 2
  • 25
  • 40
  • @SupportUkraine, I don't think so. See https://port70.net/~nsz/c/c11/n1570.html#6.3.2.3p7 – tstanisl May 27 '22 at 09:41
  • @SupportUkraine, I had similar doubts. I've even posted a question about it. The conclusion was that it is valid to use `char*` arithmetic within the struct. See https://stackoverflow.com/questions/69771966/using-offsetof-to-access-struct-member – tstanisl May 27 '22 at 09:47
  • 1
    I think it is legal, but even if there is some argument against that, I think it is essential that it works (from a pragmatic point of view). – Ian Abbott May 27 '22 at 09:58
  • It's clearly working on current systems. If it has any issues with the text of the standard, perhaps the standard should change rather than require all callers to code internally using `container_of` to derive the passed member pointers via some `member_of(containerPtr, member_name)` rather than the good old `&containerPtr->member_name`. That would require significant bodies of code to change in order to get what tangible benefit? – Petr Skocik May 29 '22 at 10:26
  • @PSkocik, The `member_of` pattern is dedicated for programs that need to be pedantic about "strict compliance" and need to use `container_of` pattern. IMO, the C standard should be updated to guarantee that `container_of` works for existing code or add `container_of` (or `_Container_of`) to the language, or to a library like `offsetof` in `stddef.h`. – tstanisl May 29 '22 at 10:39
  • @tstanisl FYI: see "A quick note about C and `offsetof`" comment [here](https://stackoverflow.com/a/47499126/1778275). – pmor Aug 02 '22 at 00:18
  • @pmor, what does this note have to do with this question? – tstanisl Aug 02 '22 at 06:59
  • The `container_of` uses `offsetof`. The `offsetof` has `((st *)0)->m`, which violates semantics of the `->` operator. Hence, use of `container_of` leads to violation of semantics of the `->` operator. Note: semantics violation does not require diagnostics. – pmor Aug 04 '22 at 01:32
  • @pmor, No. The `offsetof` is a part of the standard library. The rules about UB do not apply to the implementation of standard library. No one cares how the macro is implemented as long as it produces results compliant with the standard. – tstanisl Aug 04 '22 at 11:56
  • @tstanisl Holy moly, indeed! Thanks! – pmor Aug 04 '22 at 13:22
  • @tstanisl Extra: how compiler (e.g. GCC) "knowns" that it compiles the standard library? From experience: I implemented `memcpy` as a for-loop: instead of the for-loop `gcc -O3` generated `memcpy` leading to infinite recursion. – pmor Aug 04 '22 at 13:28
  • 1
    @pmor, there is a pragma `#pragma GCC system_header`. If you compile `#include ` with `-E` option and next compile the output again then you will be flooded with warnings. – tstanisl Aug 05 '22 at 07:58

2 Answers2

2

&s.b is a pointer. Its value that may carry implementation defined attributes like the size of a value it is pointing to

There are two cases of pointer metadata

Type #1 - Pointers point to allocated buffers where a hidden preamble holds metadata for the allocated block.

From this slideshow (slide #9 onwards):

enter image description here

This definitely doesn't affect pointer arithmetic and was not the case OP was referring to.

Type #2 - Provenance or other metadata embedded into the pointer

Here's the draft for "A Provenance-aware Memory Object Model for C". It describes the idea behind implementing pointer resolution provenance in C.

There's a quote discussing member offsets:

Pointer member offset Given a non-null pointer p at C type τ , which points to the start of a struct or union type object (ISO C suggests this has to exist, writing “The value is that of the named member of the object to which the first expression points”) with a member m, if p is (π, a), the result of offsetting the pointer to member m has the same provenance π and the suitably offset a.

Combined with two later statements about pointer arithmetic:

Pointer addition and subtraction Pointer arithmetic (addition or subtraction of integers) preserves provenance. The resulting pointer value is indeterminate if the result not within (or one-past) the storage instance.

Pointer difference Pointer difference is only defined for pointers with the same provenance and within the same array...

And the fact there are no proposals to change section 6.2.5 in the ISO standard that discusses pointer arithmetic.

Leads to the only possible conclusion, which is this is ok.


A different question would be whether or not the (char*)(ptr) operation violates strict aliasing rules.

Strict aliasing definition (just in case), from a different stack overflow post:

"Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)"

But because the operation is within the same struct and we only use it for compile-time calculations, this is ok.

Daniel Trugman
  • 8,186
  • 20
  • 41
  • Afaik, c standard allows embedding extra data into the *value* of the pointer. Those properties may be abstract and exist only during compilation (i.e. provenance). – tstanisl May 29 '22 at 10:08
  • @tstanisl, thanks. I didn't realize you were referring to that. All proposals to provenance adhere to the current ISO spec and are not really supposed to change the behavior. I added some resources on that. – Daniel Trugman May 29 '22 at 10:48
  • Interesting. Under the above definition of provenance even the original `container_of` taking `&s.b` argument would be strictly defined and no `member_of` tricks would not be necessary. Am I right? – tstanisl May 29 '22 at 10:58
  • 1
    @tstanisl, IMHO, yes. Cast should preserve provenance, so should subtraction as long as the result is within the same object. – Daniel Trugman May 29 '22 at 11:13
  • _There's a quote discussing member offsets_ Is it relevant when `p` is initialized as `int *p = &s.b;`, and not as `int *p =(int*)((char*)&s + offseof(struct s, b));`? – Language Lawyer May 30 '22 at 18:08
  • @LanguageLawyer, I think it is. The purpose of the question is to be sure – tstanisl Jun 04 '22 at 15:08
2

If the formal term you are looking for is strictly conforming, then that means no forms of poorly-defined behavior may be present. In case your examples depend on alignment/padding considerations, then they have implementation-defined behavior and are not strictly conforming for that reason.

Otherwise, C allows all manner of object pointer conversions under C17 6.3.2.3/7:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

The "converted back again" rule guarantees that no "meta data" is lost.

This rule also means that we are allowed to inspect a struct by using a character pointer, but it is indeed questionable to start that inspection anywhere else than at the beginning. An int* pointer into the middle of the struct has to be regarded as just an int* and not as a pointer to an "aggregate" type, so we may not access it out-of-bounds (iterating below the address pointed-at).

Rather, to make sense of the above rule we must be free to regard the struct as a
char [sizeof(the_struct)], or we'd get in trouble with the rules for pointer arithmetic (stated below C17 6.5.6). But to do that we need to start out with a pointer to the struct indeed.

As for the "strict aliasing rule" (6.5/7), it only applies when doing an lvalue access, so it is mostly not relevant here. Also it has special exceptions for accessing something as a character type.

So your assumptions all seem correct as per the above quoted rule 6.3.2.3/7, and they don't violate pointer arithmetic nor strict aliasing.


Regarding type safety, perhaps you could implement the macro in standard C like this:

#define member_of(ptr, type, member) \
  _Generic((ptr), type*: (type*)((char*)(ptr) + offsetof(type, member)) )
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • How are my examples affected by alignment requirements or paddings? The `int*` generated from `member_of` would always be correctly aligned. The value returned by `offsetof` is affected by the padding but it does not change the observable behavior of the function. – tstanisl Jun 01 '22 at 15:11
  • @tstanisl I said "_In case_ your examples depend on alignment/padding". I don't see any implementation-defined behavior either at a glance, but structs are notoriously non-portable in general. – Lundin Jun 02 '22 at 06:18