Why can't you use offsetof on non-POD structures in C++?

Question

I was researching how to get the memory offset of a member to a class in C++ and came across this on wikipedia:

In C++ code, you can not use offsetof to access members of structures or classes that are not Plain Old Data Structures.

I tried it out and it seems to work fine.

class Foo
{
private:
    int z;
    int func() {cout << "this is just filler" << endl; return 0;}

public: 
    int x;
    int y;
    Foo* f;

    bool returnTrue() { return false; }
};

int main()
{
    cout << offsetof(Foo, x)  << " " << offsetof(Foo, y) << " " << offsetof(Foo, f);
    return 0;
}

I got a few warnings, but it compiled and when run it gave reasonable output:

Laptop:test alex$ ./test
4 8 12

I think I'm either misunderstanding what a POD data structure is or I'm missing some other piece of the puzzle. I don't see what the problem is.

Steve Jessop · Answer 1 · 2012-10-30T09:28:11.297

Bluehorn's answer is correct, but for me it doesn't explain the reason for the problem in simplest terms. The way I understand it is as follows:

If NonPOD is a non-POD class, then when you do:

NonPOD np;
np.field;

the compiler does not necessarily access the field by adding some offset to the base pointer and dereferencing. For a POD class, the C++ Standard constrains it to do that(or something equivalent), but for a non-POD class it does not. The compiler might instead read a pointer out of the object, add an offset to that value to give the storage location of the field, and then dereference. This is a common mechanism with virtual inheritance if the field is a member of a virtual base of NonPOD. But it is not restricted to that case. The compiler can do pretty much anything it likes. It could call a hidden compiler-generated virtual member function if it wants.

In the complex cases, it is obviously not possible to represent the location of the field as an integer offset. So offsetof is not valid on non-POD classes.

In cases where your compiler just so happens to store the object in a simple way (such as single inheritance, and normally even non-virtual multiple inheritance, and normally fields defined right in the class that you're referencing the object by as opposed to in some base class), then it will just so happen to work. There are probably cases which just so happen to work on every single compiler there is. This doesn't make it valid.

Appendix: how does virtual inheritance work?

With simple inheritance, if B is derived from A, the usual implementation is that a pointer to B is just a pointer to A, with B's additional data stuck on the end:

A* ---> field of A  <--- B*
        field of A
        field of B

With simple multiple inheritance, you generally assume that B's base classes (call 'em A1 and A2) are arranged in some order peculiar to B. But the same trick with the pointers can't work:

A1* ---> field of A1
         field of A1
A2* ---> field of A2
         field of A2

A1 and A2 "know" nothing about the fact that they're both base classes of B. So if you cast a B* to A1*, it has to point to the fields of A1, and if you cast it to A2* it has to point to the fields of A2. The pointer conversion operator applies an offset. So you might end up with this:

A1* ---> field of A1 <---- B*
         field of A1
A2* ---> field of A2
         field of A2
         field of B
         field of B

Then casting a B* to A1* doesn't change the pointer value, but casting it to A2* adds sizeof(A1) bytes. This is the "other" reason why, in the absence of a virtual destructor, deleting B through a pointer to A2 goes wrong. It doesn't just fail to call the destructor of B and A1, it doesn't even free the right address.

Anyway, B "knows" where all its base classes are, they're always stored at the same offsets. So in this arrangement offsetof would still work. The standard doesn't require implementations to do multiple inheritance this way, but they often do (or something like it). So offsetof might work in this case on your implementation, but it is not guaranteed to.

Now, what about virtual inheritance? Suppose B1 and B2 both have A as a virtual base. This makes them single-inheritance classes, so you might think that the first trick will work again:

A* ---> field of A   <--- B1* A* ---> field of A   <--- B2* 
        field of A                    field of A
        field of B1                   field of B2

But hang on. What happens when C derives (non-virtually, for simplicity) from both B1 and B2? C must only contain 1 copy of the fields of A. Those fields can't immediately precede the fields of B1, and also immediately precede the fields of B2. We're in trouble.

So what implementations might do instead is:

// an instance of B1 looks like this, and B2 similar
A* --->  field of A
         field of A
B1* ---> pointer to A 
         field of B1

Although I've indicated B1* pointing to the first part of the object after the A subobject, I suspect (without bothering to check) the actual address won't be there, it'll be the start of A. It's just that unlike simple inheritance, the offsets between the actual address in the pointer, and the address I've indicated in the diagram, will never be used unless the compiler is certain of the dynamic type of the object. Instead, it will always go through the meta-information to reach A correctly. So my diagrams will point there, since that offset will always be applied for the uses we're interested in.

The "pointer" to A could be a pointer or an offset, it doesn't really matter. In an instance of B1, created as a B1, it points to (char*)this - sizeof(A), and the same in an instance of B2. But if we create a C, it can look like this:

A* --->  field of A
         field of A
B1* ---> pointer to A    // points to (char*)(this) - sizeof(A) as before
         field of B1
B2* ---> pointer to A    // points to (char*)(this) - sizeof(A) - sizeof(B1)
         field of B2
C* ----> pointer to A    // points to (char*)(this) - sizeof(A) - sizeof(B1) - sizeof(B2)
         field of C
         field of C

So to access a field of A using a pointer or reference to B2 requires more than just applying an offset. We must read the "pointer to A" field of B2, follow it, and only then apply an offset, because depending what class B2 is a base of, that pointer will have different values. There is no such thing as offsetof(B2,field of A): there can't be. offsetof will never work with virtual inheritance, on any implementation.

score 38 · Accepted Answer · edited Sep 16 '18 at 15:32

Short answer: offsetof is a feature that is only in the C++ standard for legacy C compatibility. Therefore it is basically restricted to the stuff than can be done in C. C++ supports only what it must for C compatibility.

As offsetof is basically a hack (implemented as macro) that relies on the simple memory-model supporting C, it would take a lot of freedom away from C++ compiler implementors how to organize class instance layout.

The effect is that offsetof will often work (depending on source code and compiler used) in C++ even where not backed by the standard - except where it doesn't. So you should be very careful with offsetof usage in C++, especially ~~since I do not know a single compiler that will generate a warning for non-POD use...~~ Modern GCC and Clang will emit a warning if offsetof is used outside the standard (-Winvalid-offsetof).

Edit: As you asked for example, the following might clarify the problem:

#include <iostream>
using namespace std;

struct A { int a; };
struct B : public virtual A   { int b; };
struct C : public virtual A   { int c; };
struct D : public B, public C { int d; };

#define offset_d(i,f)    (long(&(i)->f) - long(i))
#define offset_s(t,f)    offset_d((t*)1000, f)

#define dyn(inst,field) {\
    cout << "Dynamic offset of " #field " in " #inst ": "; \
    cout << offset_d(&i##inst, field) << endl; }

#define stat(type,field) {\
    cout << "Static offset of " #field " in " #type ": "; \
    cout.flush(); \
    cout << offset_s(type, field) << endl; }

int main() {
    A iA; B iB; C iC; D iD;
    dyn(A, a); dyn(B, a); dyn(C, a); dyn(D, a);
    stat(A, a); stat(B, a); stat(C, a); stat(D, a);
    return 0;
}

This will crash when trying to locate the field a inside type B statically, while it works when an instance is available. This is because of the virtual inheritance, where the location of the base class is stored into a lookup table.

While this is a contrived example, an implementation could use a lookup table also to find the public, protected and private sections of a class instance. Or make the lookup completely dynamic (use a hash table for fields), etc.

The standard just leaves all possibilities open by restricting offsetof to POD (IOW: no way to use a hash table for POD structs... :)

Just another note: I had to reimplement offsetof (here: offset_s) for this example as GCC actually errors out when I call offsetof for a field of a virtual base class.

Why would it take away freedom? Simply dereferencing a member should only give me it's address regardless of how it's organized by the compiler, right? What kind of cases does it break in? — Alex, Jul 15 '09 at 08:07
Dereferencing a member gives you the address of one member of one object. offsetof() applies to a type. Hence this breaks if the offset would differ amongst objects of the same type. Hard to believe that's possible? Consider free objects and base part objects of the same type. — MSalters, Jul 15 '09 at 09:41
So, can the following cause a crash for any T? dyn( *reinterpret_cast( startOfProgramStackAddr ),field ); If I pretend there's a an object at some valid address, and try to access a member, can the vtable ever be involved in looking up that member's address? — user48956, Jul 15 '09 at 17:12
Newer versions of g++ will generate a warning on non-POD use. See http://gcc.gnu.org/onlinedocs/gcc-4.4.0/gcc/Warning-Options.html#index-Winvalid_002doffsetof-441 — Pavel Minaev, Jul 15 '09 at 17:22
Intel 15: `warning #1875: offsetof applied to non-POD (Plain Old Data) types is nonstandard` — ThomasMcLeod, Jun 03 '16 at 21:54
Practically there's still no problem to use `offsetof` if there's no virtual inheritance. I.e. if indeed all the offsets are known at compile-time. — valdo, Nov 26 '17 at 06:45

score 5 · Answer 3 · edited Dec 08 '11 at 00:00

In general, when you ask "why is something undefined", the answer is "because the standard says so". Usually, the rational is along one or more reasons like:

it is difficult to detect statically in which case you are.
corner cases are difficult to define and nobody took the pain of defining special cases;
its use is mostly covered by other features;
existing practices at the time of standardization varied and breaking existing implementation and programs depending on them was deemed more harmful that standardization.

Back to offsetof, the second reason is probably a dominant one. If you look at C++0X, where the standard was previously using POD, it is now using "standard layout", "layout compatible", "POD" allowing more refined cases. And offsetof now needs "standard layout" classes, which are the cases where the committee didn't want to force a layout.

You have also to consider the common use of offsetof(), which is to get the value of a field when you have a void* pointer to the object. Multiple inheritance -- virtual or not -- is problematic for that use.

score 2 · Answer 4 · answered Jul 15 '09 at 08:50

I think your class fits the c++0x definition of a POD. g++ has implemented some of c++0x in their latest releases. I think that VS2008 also has some c++0x bits in it.

From wikipedia's c++0x article

C++0x will relax several rules with regard to the POD definition.

A class/struct is considered a POD if it is trivial, standard-layout, and if all of its non-static members are PODs.

A trivial class or struct is defined as one that:

Has a trivial default constructor. This may use the default constructor syntax (SomeConstructor() = default;).

Has a trivial copy constructor, which may use the default syntax.

Has a trivial copy assignment operator, which may use the default syntax.

Has a trivial destructor, which must not be virtual.

A standard-layout class or struct is defined as one that:

Has only non-static data members that are of standard-layout type

Has the same access control (public, private, protected) for all non-static members

Has no virtual functions

Has no virtual base classes

Has only base classes that are of standard-layout type

Has no base classes of the same type as the first defined non-static member

Either has no base classes with non-static members, or has no non-static data members in the most derived class and at most one base class with non-static members. In essence, there may be only one class in this class's hierarchy that has non-static members.

It does not fit the definition you posted. It violates rule 2 for standard-layout classes (different access control). — Bluehorn, Jul 15 '09 at 08:59
I was pointing out that c++0x has expanded POD and therefore g++/vc+ are working on expanding POD. Just because the final standard says that access modifiers effect POD status doesn't mean its implemented yet. I would not rely on offsetof() working on this struct in the future, mearly pointing out why it may work for teh time being. — KitsuneYMG, Jul 15 '09 at 13:49
VC++ doesn't work with the definition of "expanding POD". It just doesn't do any validation of `offsetof`, because the Standard does not require it to. In any case, C++0x is not a standard yet, and, IIRC, you have to enable the features within explicitly even for g++. — Pavel Minaev, Jul 15 '09 at 17:16

score 0 · Answer 5 · edited May 23 '17 at 12:18

0

For the definition of POD data structure,here you go with the explanation [ already posted in another post in Stack Overflow ]

What are POD types in C++?

Now, coming to your code, it is working fine as expected. This is because, you are trying to find the offsetof(), for the public members of your class, which is valid.

Please let me know, the correct question, if my viewpoint above, doesnot clarify your doubt.

edited May 23 '17 at 12:18

Community

1
1

answered Jul 15 '09 at 07:30

Roopesh Majeti

556
1
11
23

1

offsetof() is not valid for any fields of non-POD – Pavel Minaev Jul 15 '09 at 07:36
1

The quote from Wikipedia says I can't use offsetof to access memebers of data structures that aren't POD, it doesn't mention whether they're public or private memebers. My data structure isn't POD according to Wikipedia's definition, and even after adding things like virtual functions and other non-PODdy stuff it seems to be working fine. – Alex Jul 15 '09 at 07:39
Right, usually you can use offsetof on any struct/union. No compiler I know checks the validity of offsetof arguments. However, depending on the compiler the results may be valid or meaningless. – Bluehorn Jul 15 '09 at 08:08
@Bluehorn: gcc provides a warning from invalid uses of offsetof: "invalid access to non-static data member 'Foo::x' of NULL object (perhaps the 'offsetof' macro was used incorrectly)". I don't know for a fact that it catches all non-POD uses, but it certainly includes cases which would actually work in practice. I used "struct A { int i; }; struct B : public A { int j; }; int main() { B b; int n = offsetof(B,j); }" to look up the error message. – Steve Jessop Jul 15 '09 at 11:13

Braxton Nunnally · Answer 6 · 2018-08-16T20:50:19.477

This works every time and its the most portable version to be used in both c and c++

#define offset_start(s) s
#define offset_end(e) e
#define relative_offset(obj, start, end) ((int64_t)&obj->offset_end(end)-(int64_t)&obj->offset_start(start))

struct Test {
     int a;
     double b;
     Test* c;
     long d;
 }


int main() {
    Test t;
    cout << "a " << relative_offset((&t), a, a) << endl;
    cout << "b " << relative_offset((&t), a, b) << endl;
    cout << "c " << relative_offset((&t), a, c) << endl;
    cout << "d " << relative_offset((&t), a, d) << endl;
    return 0;
}

The above code simply requires you to hold an instance of some object be it a struct or a class. you then need to pass a pointer reference to the class or struct to gain access to its fields. To make sure you get the right offset never set the "start" field to be under the "end" field. We use the compiler to figure out what the address offset is at run-time.

This allows you to not have to worry about the problems with compiler padding data, etc.

score -3 · Answer 7 · answered Jul 15 '09 at 07:34

-3

If you add, for instance, a virtual empty destructor:

virtual ~Foo() {}

Your class will become "polymorphic", i.e. it will have a hidden member field which is a pointer to a "vtable" that contains pointers to virtual functions.

Due to the hidden member field, the size of an object, and offset of members, will not be trivial. Thus, you should get trouble using offsetof.

answered Jul 15 '09 at 07:34

Ropez

3,485
3
28
30

2

-1: I don't know if the standard explicitely forbids it, but your explanation is inconsequential. The VMT is an additional member field, but with respect to offsetof it is no different than "funny extra padding". Bluehorn illustrates nicely why the offset is correct only if you have the correct "base" pointer type. – peterchen Jul 15 '09 at 09:26

score -3 · Answer 8 · answered Jul 15 '09 at 07:37

-3

I bet you compile this with VC++. Now try it with g++, and see how it works...

Long story short, it's undefined, but some compilers may allow it. Others do not. In any case, it's non-portable.

answered Jul 15 '09 at 07:37

Pavel Minaev

99,783
25
219
289

I'm running g++ on OSX right now. I haven't tried it on my windows machine, but that was going to be my next stop just out of curiosity. – Alex Jul 15 '09 at 07:41
I'm not sure which g++ version is in OS X; however, for the newer ones, and if you do not disable warnings, you should see a warning about applying "offsetof" to non-POD type. This is because g++ doesn't do the usual simple hack of pointers-to-members-of-NULL to implement the offsetof macro, but uses its own `__offsetof__` extension, which does the validation. VC++ uses the hack, which doesn't catch errorneous use. See http://gcc.gnu.org/onlinedocs/gcc-4.4.0/gcc/Warning-Options.html#index-Winvalid_002doffsetof-441 – Pavel Minaev Jul 15 '09 at 17:21

score -3 · Answer 9 · answered Dec 17 '13 at 05:43

-3

Works for me

   #define get_offset(type, member) ((size_t)(&((type*)(1))->member)-1)
   #define get_container(ptr, type, member) ((type *)((char *)(ptr) - get_offset(type, member)))

answered Dec 17 '13 at 05:43

Hamdi Hamdi

5

2

See the top voted answers for why this won't always work and is a bad idea. – Alex Dec 17 '13 at 18:18
1

This works fine if there is no virtual inheritance See https://stackoverflow.com/questions/1129894/why-cant-you-use-offsetof-on-non-pod-structures-in-c#comment81942456_1130035 – Dave Butler Dec 17 '17 at 21:29

Nick Dandoulakis · Answer 10 · 2009-07-15T11:01:11.170

-4

In C++ you can get the relative offset like this:

class A {
public:
  int i;
};

class B : public A {
public:
  int i;
};

void test()
{
  printf("%p, %p\n", &A::i, &B::i); // edit: changed %x to %p
}

edited Jul 15 '09 at 11:01

answered Jul 15 '09 at 07:35

Nick Dandoulakis

42,588
16
104
136

3

This is not correct, for two reasons: 1. `&A::i` is a pointer-to-member, and cannot be output with `printf()` 2. even if it could, neither A::i nor B::i have storage (they're not functions) without an object instance What you probably meant was struct A { int i; }; struct B : A { int i; } B b; printf( "%x, %x\n", &b.A::i, &b.i ); – Marc Mutz - mmutz Jul 15 '09 at 07:44
1

`&A::i` will be calculated at compile time and it's not a pointer. It's the relative field's offset (*number*). The actual memory offset would be something like `&object + &A::i`. – Nick Dandoulakis Jul 15 '09 at 07:49
1

My question is really about why offsetof is supposedly not valid, but while we're on the topic I tried swapping out my offsetof(x, y) with &x::y in the code above and my output was 1 1 1. That doesn't sound right to me. Any idea what's going on? On a side note, why are you using printf in c++? ;) – Alex Jul 15 '09 at 07:54
not `&x::y` but `Foo::x` or `Foo::y`. `printf` is an old *bad* habit of mine. Another reason is that I use various *format* functions a lot, and `printf` *feels* natural :) – Nick Dandoulakis Jul 15 '09 at 08:08
When I typed x::y I meant so in a generic way. In the code I used &Foo::x, &Foo::y and &Foo:f and got 1 1 1 as the output. – Alex Jul 15 '09 at 08:14
1

I tested your code with `Foo::x` ... and I get correct results. I used VC++ 6.0. – Nick Dandoulakis Jul 15 '09 at 08:15
1

Oh, `cout` prints 111 but `printf` prints the correct values. Another reason to use `printf`? :o) – Nick Dandoulakis Jul 15 '09 at 08:20
I was just about to post the same thing. I can't get it to work right with cout, but printf seems to be giving sane values... weird. – Alex Jul 15 '09 at 08:25
7

Well, you're invoking undefined behavior. The printf format argument %x doesn't match the type you're actually tyring to print. It's quite possible that you're seeing the result of stack corruption in fact. – MSalters Jul 15 '09 at 09:36
@MSalters, &Class::field is a way to pass around relative field offsets. Are you sure that it's undefined behavior? It's in no way results from a stack corruption. – Nick Dandoulakis Jul 15 '09 at 10:31
1

I saw that trick here: http://stackoverflow.com/questions/1030608/summing-struct-members-inside-a-vector/1030708#1030708 – Nick Dandoulakis Jul 15 '09 at 10:56
2

@Nick: in the code you link to, the class was POD. Pointers to members of non-POD classes are not necessarily simple offsets. – Steve Jessop Jul 15 '09 at 11:15
C++ struct is actually a Class and :: scope resolution operator is a C++ thing, right? I'd really like to find a good reference for this problem. And to make my self clear, I don't use such kind of code in production code, only for experimentation. – Nick Dandoulakis Jul 15 '09 at 13:16
1

@NickD: The problem is that printf("%x", value) requires that `value` is an integer or converts to one when passed as a vararg. Cf. print("%s", value) requires that value is a non-null (const) char*. &Class::field may be a relative offset but it's no int. – MSalters Jul 15 '09 at 14:45

score -4 · Answer 11 · answered Apr 29 '13 at 01:59

-4

This seems to work fine for me:

#define myOffset(Class,Member) ({Class o; (size_t)&(o.Member) - (size_t)&o;})

answered Apr 29 '13 at 01:59

Greg Slepak

1,613
16
17

1

Probably because it causes a runtime crash ;) – jww Sep 16 '13 at 03:36
1

@noloader, I never got a crash. – Greg Slepak Sep 17 '13 at 14:58
1

it is inefficient since it constructs an instance of Class in every instantiation, and assumes that Class has a default constructor. Angelscript defined a macro like this `#define asOFFSET(s,m) ((size_t)(&reinterpret_cast~~(100000)->m)-100000)`~~ – Urkle Nov 09 '13 at 17:01
Most implementations I've seen (and written) are similar, but they get simply return the address of the member using a `null` object pointer instead of a dummy number, eg: `#define myOffset(Class,Member) ((size_t)&(reinterpret_cast(0)->Member))` – Remy Lebeau Jun 08 '18 at 02:56

Why can't you use offsetof on non-POD structures in C++?

11 Answers11

Appendix: how does virtual inheritance work?

Linked

Related