8

I came across the macro below

#define OFFSETOF(TYPE, ELEMENT) ((size_t)&(((TYPE *)0)->ELEMENT))

I kind of not able to digest this because in c++, when I try to deference a null pointer, I expect an unexpected behaviour... but how come it can have an address? what does address of null mean?

Karthick
  • 2,844
  • 4
  • 34
  • 55
  • 5
    Even in C++ there is no exception being thrown on dereference of an invalid pointer. It is simply undefined behaviour. – pmr Sep 22 '11 at 07:39
  • @pmr.. ok.. agreed... but do you see an answer to my second q? – Karthick Sep 22 '11 at 07:41

8 Answers8

11

For the purpose of the macro: It assumes that there is an object of type TYPE at address 0 and returns the address of the member which is effectively the offset of the member in the structure.

This answer explains why this is undefined behaviour. I think that this is the most important quote:

If E1 has the type “pointer to class X,” then the expression E1->E2 is converted to the equivalent form (*(E1)).E2; *(E1) will result in undefined behavior with a strict interpretation, and .E2 converts it to an rvalue, making it undefined behavior for the weak interpretation.

which is the case here. Although others think that this is valid. It is important to note that this will produce the correct result on many compilers though.

Community
  • 1
  • 1
pmr
  • 58,701
  • 10
  • 113
  • 156
  • Calculating offset off of 0 is very well defined. – littleadv Sep 22 '11 at 07:52
  • @littleadv Can you back that up with a quote somehow? I don't see how this is different from what is done in the questions. – pmr Sep 22 '11 at 07:53
  • @littleadv: no, it's not, it just happens to be supported on most compilers. – Matthieu M. Sep 22 '11 at 07:55
  • Actually you linked to the quote yourself. You probably just didn't bother reading. It's the second link in your answer. – littleadv Sep 22 '11 at 07:55
  • @littleadv I don't see how you get to that conclusion based on the answers in that question. Even with the "strict" and "weak" interpretations mentioned by GMan this is going to be UB. – pmr Sep 22 '11 at 07:59
  • @pmr - Accessing NULL pointer is UB. In this macro there's no accessing. Ergo - the behavior is well defined. I have no idea why you guys keep repeating the same mistakes instead of just reading the standard. Gman explained it very well. Read the answer. If it's not clear - read it again. – littleadv Sep 22 '11 at 08:00
  • That's a different case. There's no doubt that `sizeof(*(int*)NULL)` is well-defined; there's no doubt that `5 + *(int*)NULL` is UB. The linked example is closer to the first. – MSalters Sep 22 '11 at 08:03
  • `If E1 has the type “pointer to class X,” then the expression E1->E2 is converted to the equivalent form (*(E1)).E2; *(E1) will result in undefined behavior with a strict interpretation, and .E2 converts it to an rvalue, making it undefined behavior for the weak interpretation.` This is the case here. – pmr Sep 22 '11 at 08:03
  • @pmr - but your quote is irrelevant to the macro. The first parameter is **type**, not object. – littleadv Sep 22 '11 at 08:11
  • 1
    @littleadv `((Type*)0)` casts 0 to `pointer to class Type` and from there on everything is the same. I don't get your point. – pmr Sep 22 '11 at 08:20
  • @littleadv The expression in question is `((TYPE*)0)->ELEMENT`. The first argument of the `->` is `(TYPE*)0`. `(TYPE*)0` is a null pointer. QED – James Kanze Sep 22 '11 at 08:21
  • @pmr - the point is that in the answer you refer to, the case is `return (*0.functionCall())`, and here the case is `return (0 + memberOffset)`. E2 in this case is not on its own an lvalue, so the example doesn't hold. I'm sure you noticed the link and decided to ignore it, so here it is again: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232 – littleadv Sep 22 '11 at 08:25
  • 1
    @littleadv Contrary to all your accusations I've read the link. The point is not that `E2` is an lvalue but that `operator.` triggers conversion of the nullpointer to an rvalue. Even so the `&` requires its argument to be an lvalue. – pmr Sep 22 '11 at 08:42
  • @pmr.. I just visited now to comment about the lvalue issue but you have already done so.. So the conclusion is &0 is invalid and &lvalue is required.. is that correct? – Karthick Sep 22 '11 at 08:55
  • @littleadv: demonstration by example: `struct Base { int foo; };`, `struct Derived: virtual Base {};`, `OFFSETOF(Derived, foo)` --> *Boom*. Accessing a member inherited from a virtual base class requires accessing the virtual base class in-memory location. It's impossible to deduce it from a type information only. Interestingly, `gcc` detects both cases correctly (official macro and non-official) see http://ideone.com/0pHVm even though they use a builtin: `__builtin_offsetof` to implement `offsetof`. – Matthieu M. Sep 22 '11 at 08:59
  • @littleadv Yes. I'm parroting the C++ standard, and its authors. I'd suggest you read it (and what people are posting here). – James Kanze Sep 22 '11 at 09:44
8
#define OFFSETOF(TYPE, ELEMENT) ((size_t)&(((TYPE *)0)->ELEMENT))

is very similar to a fairly common definition of the standard offsetof() macro, defined in <stddef.h> (in C) or <cstddef> (in C++).

0 is a null pointer constant. Casting it to TYPE * yields a null pointer of type TYPE *. Note that the language doesn't guarantee (or even imply) that a null pointer has the value 0, though it very commonly does.

So (TYPE *)0 is notionally the address of an object of type TYPE located at whatever address the null pointer points to, and ((TYPE *)0)->ELEMENT)) is the ELEMENT member of that object.

The & operator takes the address of this ELEMENT member, and the cast converts that address to type size_t.

Now if a null pointer happens to point to address 0, then the (nonexistent) object of type TYPE object starts at address 0, and the address of the ELEMENT member of that object is at an address that's offset by some number of bytes from address 0. Assuming that the implementation-defined conversion from TYPE * to size_t behaves in a straightforward manner (something else that's not guaranteed by the language), the result of the entire expression is going to be the offset of the ELEMENT member within an object of type TYPE.

All this depends on several undefined or unspecified behaviors. On most modern systems, the null pointer is implemented as a pointer to address 0, addresses (pointer values) are represented as if they were integers specifying the index of a particular byte within a monolithic addressing space, and converting from a pointer to an integer of the same size just reinterprets the bits. On a system with such characteristics, the OFFSETOF macro is likely to work, and the implementation may choose to use a similar definition for the standard offsetof macro. (Code that's part of the implementation may take advantage of implementation-defined or undefined behavior; it's not required to be portable.)

On systems that don't have these characteristics, this OFFSETOF macro may not work -- and the implementation must use some other method to implement offsetof. That's why offsetof is part of the standard library; it can't be implemented portably, but it can always be implemented in some way for any system. And some implementations use compiler magic, like gcc's __builtin_offsetof.

In practice, it doesn't make much sense to define your own OFFSETOF macro like this, since any conforming C or C++ implementation will provide a working offsetof macro in its standard library.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
6

This is not dereferencing a pointer, but returning the offset of the element in the structure.

for example for

typedef struct { char a; char b;} someStruct;

Calling OFFSETOF(someStruct, b) will return 1 (assuming its packed etc etc).

This is the same as doing this:

someStruct str;
offset = (size_t)&(str.b) - (size_t)&str;

except that with OFFSETOF you don't need to create a dummy variable.

This is needed when you need to find an offset of the class/struct/union member for whatever reason.

** Edit **

To all the hasty downvoters who think that "the standard doesn't allow this" - please read the standard again. The behavior is very well defined in this case.

** Another edit **

I believe none of the downvoters noticed that the first parameter is type. I'm sure that if you think a little bit more than the half a second it takes to downvote, you'll understand your mistake. If not - well, it won't be the first that a bunch of ignorant downvoters suppressed a correct answer.

littleadv
  • 20,100
  • 2
  • 36
  • 50
  • Yeah but what does address of NULL mean? Why is this allowed? – Karthick Sep 22 '11 at 07:47
  • 5
    But to get that address it is using `operator->` on the nullptr which is UB. Your explanation is correct. Still OP's code triggers UB. There is no point here. – pmr Sep 22 '11 at 07:48
  • @pmr - it's not. Who says its undefined behavior? – littleadv Sep 22 '11 at 07:51
  • 1
    @littleadv: the standard does. :) Many compilers do support this, though. – Michael Foukarakis Sep 22 '11 at 07:51
  • @MichaelFoukarakis - the standard actually doesn't say that. Dereferencing an invalid pointer is undefined behavior, but calculating offset off of 0 is perfectly valid. See here: http://stackoverflow.com/questions/2474018/when-does-invoking-a-member-function-on-a-null-instance-result-in-undefined-behav/2474021#2474021 – littleadv Sep 22 '11 at 07:53
  • The second paragraph of the accepted answer in that question begs to differ. Strictly speaking, it is UB. – Michael Foukarakis Sep 22 '11 at 08:02
  • @MichaelFoukarakis Please quote. I couldn't find in the standard anything that would make it illegal to do "0 + something". If you can find that - do share. – littleadv Sep 22 '11 at 08:06
  • 5
    @littleadv The standard is very clear here: `a->b` is exactly equivalent to `(*a).b`. And `*a` is undefined behavior is `a` is a null pointer. The issue has been discussed before, and there's really no ambiguity in the standard here. The only places where dereferencing a null pointer is not undefined behavior is in contexts such as `sizeof`, where the expression is not evaluated. – James Kanze Sep 22 '11 at 08:08
  • @JamesKanze look at my last edit. If it is still not clear to you why you're wrong, well, I tried. – littleadv Sep 22 '11 at 08:14
  • 2
    @littleadv You are still ignoring the basic issue. The code in question dereferences a null pointer. There's no way you can get around that. And dereferencing a null pointer is undefined behavior. And if it's not clear to you why you're wrong, you just haven't read the standard. (FWIW: this exact macro was discussed in the committee, and `offsetof` was introduced into the standard because the macro had undefined behavior.) – James Kanze Sep 22 '11 at 08:19
  • @JamesKanze - you're wrong. Read here. http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232. I'm done arguing about it, if you guys want to downvote - downvote, it may make you feel better with your ignorance, but won't make you right. – littleadv Sep 22 '11 at 08:27
  • see the last bullet of my answer ... and thanks for the downvote. – Emilio Garavaglia Sep 22 '11 at 08:32
  • @EmilioGaravaglia - didn't downvote you. But thanks for the same though. – littleadv Sep 22 '11 at 08:40
  • 1
    @littleadv The document you cite proposes some changes, which would have made the expression legal (and introduced an incompatibility between C and C++). None of the proposed changes were adopted, however, and the concept of "empty lvalue" isn't present in the draft that was adopted a few months ago (and was never present in C++03). – James Kanze Sep 22 '11 at 09:54
  • @littleadv: I was not referring to you in particular. I simply noticed a behavior. By the way, many of the down-votes here are result of partial reading ... – Emilio Garavaglia Sep 22 '11 at 15:34
4

The purpose of OFFSETOF is to return the distance between the address of a member and the address of the aggregate it belongs.

If the compiler doesn't change the object layout depending on its placement, that "distance" is constant and hence the address you start from is irrelevant. 0, in such case, it is just an address like any other.

According to C++ standard accessing an invalid address is "undefined behavior", but:

  • If that's part of a compiler support library (this is the actual code of "OFFSETOF" in the CRT coming with VS2003!), that may be not so "undefined" (for a known compiler and platform, that behavior is known to the support library developer: of course, this must be considered "platform specific code", but different platform will probably have different library versions)

  • In any case, you are not "acting" on the element (so no "access" is done), just doing some plain pointer arithmetic. Thnk as a general demontration like "If there is an object at location 0 its supposed ELEMENT member will start al location 6. hence 6 is the offset". The fact that there is no real such object is irrelevant.

  • By the way, this macro fails (with a segmentation fault!) if the ELEMENT is inherited by TYPE by means of a virtual base, since, to locate the placement of avirtual base you need to access some runtime informations -usually part of an object v-table- whose location cannot be detected, being the object address not a "real" address. That's the why the standard cautelatively says that "dereferencing an invalid pointer is undefined behavior".


TO DOWNVOTERS:

I provide platform specific information for a platform specific ansewr. Before downvote, please provide a demonstration that what i said is false.

Emilio Garavaglia
  • 20,229
  • 2
  • 46
  • 63
4

Dereferencing a null pointer (as this macro does) is undefined behavior. It is not legal for you to write and use such a macro, unless the implementation gives you some special, additional guarantee.

The C standard library defines a macro offsetof; many implementations do use something similar to this. The implementation can do it because it knows what the compiler actually generates in this case, and whether it will cause problems or not. The implementation of the standard library can use a lot of things you can't.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
3

A. The action is valid, no exception will be thrown because you don't try to access the memory the pointer is pointing to.
B. null pointer - it's basically a normal pointer saying the object sits in address 0 (Address 0 by definition is a invalid address for real objects) but the pointer it self valid.

So this macro is mean: if an object of type TYPE is starting in address 0 where will his ELEMENT will be in memory? in other words what's is the offset from ELEMENT to the start of TYPE object.

Roee Gavirel
  • 18,955
  • 12
  • 67
  • 94
  • No, the action is not valid. UB when dereferencing the nullptr has nothing to do with memory access or not. – pmr Sep 22 '11 at 07:52
  • 1
    @pmr - repeating something wrong doesn't make it right. There's nothing undefined in "return 0+some number". – littleadv Sep 22 '11 at 08:02
  • 1
    @pmr - as much as I know, dereferencing means accessing (reading \ writing to memory a pointer point to). so you don't really dereferencing here. you just doing some pointers arithmetic, which is vary common practice in C (check some of Linux kernel code to see it all over). – Roee Gavirel Sep 22 '11 at 08:03
  • @littleadv Ignoring what is written in the standard does make an answer wong. The standard is quite clear here: if the expression `a->b` is used in a context where it is evaluated, the program contains undefined behavior. – James Kanze Sep 22 '11 at 08:10
3

That's one hell of a macro, piling up undefined behavior...

What it is attempting to do: getting the offset of a struct member.

How it tries to do it:

  • Use a null pointer (value 0 in the code)
  • Take the element (let the compiler compute the address of it, from 0)
  • Take the address of the element (using &)
  • Cast the address into a size_t

There are two issues:

  • Dereferencing a null pointer is undefined behavior, so technically anything could happen
  • Casting a pointer into a size_t is not something that should be done (the problem is that a pointer is not guaranteed to fit)

How it could be done:

  • Use a real object
  • Compute the difference of address

In code:

#define OFFSETOF(Object, Member) \
  ((diffptr_t)((char*)(&Object.Member) - (char*)(&Object))

However it requires an object, so might not be suitable for your purposes.

How it should be done:

#include <cstddef>
#define OFFSETOF(Struct, Member) offsetof(Struct, Member)

But there would be little point... right ?

For the curious, the definition can be something like: __builtin_offsetof(st, m) (from Wikipedia). Some compilers implement it with null dereferences, but they are the compilers, and thus know that they treat this case safely; this is not portable... and does not have to be since switching compiler, you also switch the C library implementation.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 1
    The "correct definition" depends on the compiler; there is no way to implement this macro portably. Your example of the correct definition is only correct for gcc; it is, IMHO, the best in terms of quality of implementation, but many compilers do use something along the lines of the original macro. In such cases, the macro is "correct" if it is part of the implementation, or if the compiler makes some explicit guarantee (but I've never seen this second case). – James Kanze Sep 22 '11 at 08:14
  • 2
    @James: I guess my sentence is ambiguous, obviously I meant to encourage the use of the `offsetof` macro defined in `cstddef` and only included the exact definition of it as an example. It's pointless to reimplement something the C library provides for you :) – Matthieu M. Sep 22 '11 at 08:17
  • @Matthieu M I agree totally with the overall posting (which is much better than mine); I only worried that someone might misinterpret your last statement as implying that definitions **in the standard library** which don't use some sort of magic built-in are in some way "incorrect". While from a QoI point of view, I prefer the gcc solution, you can't really send in a bug report because VC++ uses the hacky macro; the authors of the their library implementation work with the authors of the compiler to ensure that it works. – James Kanze Sep 22 '11 at 08:26
  • @James: I understand... and meant to edit right away, but my connection hanged :/ It's now corrected so as to be less ambiguous, thanks for you remarks :) – Matthieu M. Sep 22 '11 at 08:53
2

littleadv had the intent of the construct just right. Explaining a little bit: You cast a struct pointer pointing to address 0x0 and dereference on of its elements. The address you point to is now at 0x0 + whatever offset the element has. Now you cast this value to a size_t and get the offset of the element.

I'm not sure how portable this construct is, though.

thiton
  • 35,651
  • 4
  • 70
  • 100
  • The problem is "its elements". There is no object located at address 0, so you can't talk about its elements. (If litteadv would have been right, it would have been portable, and there would not be a need for an `offset_of` macro in the C standard. – MSalters Sep 22 '11 at 08:05
  • littleadv had it wrong: the expression invokes undefined behavior. – James Kanze Sep 22 '11 at 08:11