0

I am unsure what André Caron means here:

Virtual functions in C

... some of this code relies on (officially) non-standard behavior that "just happens" to work on most compilers. The main issue is that the code assumes that &m.base == &m (e.g. the offset of the base member is 0). If that is not the case, then the cast in custom_bar() results in undefined behavior. To work around this issue, you can add an extra pointer in struct foo as such:

m is of type struct meh *. An object f of type struct foo * is assigned to m through a cast to struct meh *. struct meh has member base of type struct foo (struct foo meh::base = foo::bar). Why it is supposedly not guaranteed that &m.base == &m? I can see this if the structure is not a POD. André also hints at this. However, why is it necessary for a POD structure to have another pointer void *foo::hook?

struct meh * m = (struct meh*)f; becomes struct meh * m = (struct meh*)f->hook;. After he assigns hook to m->base.hook = m;.

struct meh
{
   /* inherit from "class foo". MUST be first. */
   struct foo base;
   int more_data;
};

Below, I listed relevant ISO C90/C++98 excerpts from my research. I also created a code example. The example code can be compiled with Clang via -fsanitize=undefined -std=c++98 -O0 -Wall -Wextra -Wpedantic -Wconversion -Wundef.

Here it is:

https://godbolt.org/z/qo9f8KnYM

Excerpts

From ISO C90 (ANSI C89):

An object shall have its stored value accessed only by an lvalue that has one of the following types: /28/

...

  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a
    subaggregate or contained union), or

A pointer to a structure object, suitably cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may therefore be unnamed holes within a structure object, but not at its beginning, as necessary to achieve the appropriate alignment.

From ISO C++98:

16 If a POD-union contains two or more POD-structs that share a common initial sequence, and if the POD- union object currently contains one of these POD-structs, it is permitted to inspect the common initial part of any of them. Two POD-structs share a common initial sequence if corresponding members have layout- compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members. 17 A pointer to a POD-struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [Note: There might therefore be unnamed padding within a POD-struct object, but not at its beginning, as necessary to achieve appropriate alignment. ]

Code example

#include <iostream>

struct A {
  int m1;
};

struct B {
  int m1;
  int m2;
};

struct C {
  struct A super;
  int m3;
};

int main(void) {
  struct A a = {42};
  struct C c = {{666}, 1984};

  // Access A::m1 through pointer of type B
  std::cout << ((B *)&a)->m1 << std::endl; // 42

  // Access A::m1 through pointer of type C
  std::cout << ((C *)&a)->super.m1 << std::endl; // 42

  // Access C::super::A::m1 through pointer of type A.
  std::cout << ((A *)(&c))->m1 << std::endl; // 666

  return 0;
}

Edit 1: Let me rewrite this question in this edit section. I will ignore C++, as people in the comments told me to not complicate the question. If this edit is more helpful than the original, then perhaps you can consider replacing the original post with this edit. Or I or someone else can just "strike-through" the original one. Or, if you have a better idea on how to improve my question, please tell me. (I might add that I have issues with attention and get lost in details quite easily... I will leave it at that. You may have guessed what it is...) If my second attempt still fails to deliver, then perhaps I should take my failure to ask a clear question as a hint to think and write it down another time, if applicable. Without further ado, here is my second attempt to pose this question:

I am referring to an answer posted here:

Virtual functions in C

  struct Base {
    int x;
  };

  struct Derived {
    struct Base super;
  };

If offsetof(struct Derived, super) == 0 and offsetof(struct Base, x) == 0, can we then imply that offsetof(struct Derived, super.x) == offsetof(struct Base, x)?

André Caron suggests using an extra pointer pointing to a derived object. Apparently, it is not sufficient or portable to rely on offsetof(struct Derived, super.x) == offsetof(struct Base, x).

Even though this works, you are relying on compiler extensions for type punning that can lead to undefined behavior blablabla. This works in GCC and MSVC for a fact.

Indeed the alignment stuff relies on compiler extensions. You can make it portable using an extra void* pointer in struct foo that points to the "derived object". However, the technique is sufficiently popular in well-known libraries to be considered "portable". Any compiler that made this type of code break would have lots of complaints from its customers.

I have trouble understanding why offsetof(struct Derived, super.x) != offsetof(struct Base, x) could potentially be the case. I have not found clarification in the C standards. Hence, I am looking for further clarification on that.

13:26, restate my assumptions:

Assuming offsetof(struct Derived, super.x) != offsetof(struct Base, x)

  struct Base {
    int x;
    void *hook;
  };

  struct Derived {
    struct Base super;
  };

With the assumption above, consider:

  struct Base base = {42};
  struct Derived derived;
  base.hook = &base; /* Assuming offsetof(struct Base, x) == 0 */
  derived.super = base;

(struct Base*)(derived.super.hook) == &base shall be true.

#include <stddef.h>
#include <stdio.h>

struct Base {
  int x;
  void *hook;
};

struct Derived {
  struct Base super;
};

int main(void) {
  struct Base base = {42};
  struct Derived derived;
  base.hook = &base; /* Assuming offsetof(struct Base, x) == 0 */
  derived.super = base;

  printf("Offset Base x: %lu\n", offsetof(struct Base, x));
  printf("Offset Derived super: %lu\n", offsetof(struct Derived, super));
  printf("Offset Derived super.x: %lu\n", offsetof(struct Derived, super.x));
  printf("Offset Derived super.hook: %lu\n",
         offsetof(struct Derived, super.hook));
  printf("derived.super.hook == &base, yields %d",
         (struct Base *)(derived.super.hook) == &base);

  return 0;
}
  • [tag:language-lawyer]? – ikegami May 14 '22 at 03:36
  • Why is this tagged with `C`; when it is obviously another language. – mevets May 14 '22 at 03:53
  • Also, TL;DR. Despite efforts to neuter C, it remains that `struct x { char a,b,c,d; } y;` and `char z[4];` are and remain structurally equivalent. That is that the `offsetof( struct x, { a, b, c, d })` correspond to `&z[0]-z, &z[1]-z, &z[2] -z, &z[3] - z` resp. New imaginations of the so-called standard permits decorations (align, etc) that can make this not true, to the detriment of all but the standard committee's pretty little model of an abstract computer. – mevets May 14 '22 at 04:02
  • @mevets, wut. Alignments restrictions are NOT a new thing. This existed in plenty of old hardware. – ikegami May 14 '22 at 04:04
  • @mevets I wrote my example code in C++98, because it's easier to write in. However, I am also interested in the C89 case. If you insist on a C89 example, here you go: https://godbolt.org/z/EM6M1eTKj Here's another one: https://godbolt.org/z/5TeoPToKx I "hand-transpiled" the C++ code example to C89 from here: https://cplusplus.com/doc/tutorial/polymorphism/ – user19113444 May 14 '22 at 04:31
  • 1
    @user19113444 *"I am also interested in the C89 case."* -- since the C89 and C++98 are distinct standards -- something that should be evident by comparing the quotes in your question -- the answer could conceivably be vastly different for the two languages. That is, your interest warrants two Stack Overflow questions -- one for C and one for C++. *(Also, you should expect some pushback on the C++ question, as the basis of your question is rooted in C. Since C and C++ a re distinct languages, it does not make a lot of sense to ask how C++ permits something allowed by a different language.)* – JaMiT May 14 '22 at 05:15
  • @JaMiT, not sure if I can follow you. The C89 and C++98 case is quite similar (assuming POD structures). Also, I manually tanspiled the C++98 code to C89, and it compiles without warnings. The core of my issue is perhaps that I am not absolutely certain why André used a `hook` in his C code example. He hints that it can be useful for non-POD structures. That's why I also included C++98 into the mix. I hope it isn't the end of the world asking something related. – user19113444 May 14 '22 at 05:39
  • If it's such an inconvenience to ask a C question together with an arguably related C++ question, then the C part suffices me. I mean, sure modern idiomatic C++ is different from C, however C and C++ share similarities in some aspects. It's not like I am asking about C and Scala. Also, C and C++ share more similarities than C and Scala. It's not like the intersection of the two language standards is empty. The element count ||C intersected with C++|| isn't negligibly small. Compare that to ||C intersected with Scala/Java/Python||. – user19113444 May 14 '22 at 06:13
  • This will be my last input, I will be just a silent observer after this: the question I am referring to is about virtual functions in C. Virtual functions are a C++ construct. Virtual functions can be emulated in C. Lambdas can be emulated in C. Currying can be emulated in C. So why is this such a huge deal when I ask a related topic to C? Again, the original question deals with emulating virtual functions in C++. Then André hints that "hooking" the structure is more guaranteed to work and all I want to know is: "why is this useful in C or why is it apparently undefined behavior in C?" – user19113444 May 14 '22 at 06:26
  • *"If a POD-union contains two or more POD-structs that share a common initial sequence"* This rule is **only** valid for pod-structs that are members of a union. It is surprisingly **not** valid for the same PODs outside of a union (really!). Or for non-struct members of the same union. So very limited use. – BoP May 14 '22 at 08:09
  • Aren't all structs in C POD? And C and C++ POD types are compatible. They will always have the first member at offset 0. – Goswin von Brederlow May 14 '22 at 11:09
  • The question in the title is answered by https://stackoverflow.com/a/53578665/362589 – Daniel May 14 '22 at 11:39
  • @user19113444 *"quite similar"* does not mean *"the same as"*. Sometimes a question has the same answer for C and for C++, sometimes not. Generally, someone asking a question is not in a position to know if the answer would be the same in both C and C++ (or else they would likely also know the answer). Hence, the guidance in [the c tag info](https://stackoverflow.com/tags/c/info) about __Using [tag:c] and [tag:c++] together__. – JaMiT May 14 '22 at 12:07

1 Answers1

0

However, why is it necessary for a POD structure to have another pointer void *foo::hook?

It isn't necessary. From the original question and answer:

This technique is more reliable, especially if you plan to write the "derived struct" in C++ and use virtual functions. In that case, the offset of the first member is often non-0 as compilers store run-time type information and the class' v-table there.

A c++ struct/class with virtual function is not POD. Any non POD structure/class can have a non-0 offset for the data members and that is the case the hook is there to handle.

Goswin von Brederlow
  • 11,875
  • 2
  • 24
  • 42
  • In other words: `offsetof(struct Derived, super.x) == offsetof(struct Base, x)` is always true in POD structures in C (and C++)? While in C++ generally `offsetof(struct Derived, super.x) == offsetof(struct Base, x)` is not always the case (meaning POD and non-POD structures)? I rewrote my original question in its edit section. – user19113444 May 14 '22 at 11:37
  • Maybe. It's implementation defined, or more specific it's defined in the architectures calling conventions. But it's common enough that people remember that you have to care about it. – Goswin von Brederlow May 14 '22 at 11:42
  • So in that case, a `hook` could help circumvent undefined behavior in C (and C++)? I might be overly pedantic here... – user19113444 May 14 '22 at 11:48
  • In C it's always POD. The `hook` is only for non-POD C++ struct/class. – Goswin von Brederlow May 14 '22 at 11:53
  • Okay, the "maybe" in your comment was irritating me a little. So in C it is always `offsetof(struct Derived, super.x) == offsetof(struct Base, x)` without exception. In C++ it is implementation defined as you said. I think I got it now. Thank you! – user19113444 May 14 '22 at 11:57