6

I have read (Inside C++ object model) that address of pointer to data member in C++ is the offset of data member plus 1?
I am trying this on VC++ 2005 but i am not getting exact offset values.
For example:

Class X{  
  public:  
    int a;  
    int b;  
    int c;
}

void x(){  
  printf("Offsets of a=%d, b=%d, c=%d",&X::a,&X::b,&X::c);
}  

Should print Offsets of a=1, b=5, c=9. But in VC++ 2005 it is coming out to be a=0,b=4,c=8.
I am not able to understand this behavior.
Excerpt from book:

"That expectation, however, is off by one—a somewhat traditional error for both C and C++ programmers.

The physical offset of the three coordinate members within the class layout are, respectively, either 0, 4, and 8 if the vptr is placed at the end or 4, 8, and 12 if the vptr is placed at the start of the class. The value returned from taking the member's address, however, is always bumped up by 1. Thus the actual values are 1, 5, and 9, and so on. The problem is distinguishing between a pointer to no data member and a pointer to the first data member. Consider for example:

float Point3d::*p1 = 0;   
float Point3d::*p2 = &Point3d::x;   

// oops: how to distinguish?   
if ( p1 == p2 ) {   
   cout << " p1 & p2 contain the same value — ";   
   cout << " they must address the same member!" << endl;   
}

To distinguish between p1 and p2, each actual member offset value is bumped up by 1. Hence, both the compiler (and the user) must remember to subtract 1 before actually using the value to address a member."

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
theneuronarc
  • 81
  • 1
  • 4
  • 3
    `pointer to data member in C++ is the offset of data member plus 1` - Where did you get this information? – PeterK Aug 13 '10 at 14:53
  • It is tough to answer this question without knowing the sizeof(int) on your given platform – Chubsdad Aug 13 '10 at 14:53
  • 3
    Look at a ruler, it starts at 0 too. – nos Aug 13 '10 at 14:56
  • Nice analogy, but its not geometry its C++. :) – theneuronarc Aug 13 '10 at 15:11
  • What book are you reading that has this? – Phil Aug 13 '10 at 15:15
  • 1
    That excerpt you quote is missing CONTEXT .. what is it talking about ? In any case, your sample code results are exactly correct, and speak the truth better than the documentation (which is a truism of coding ). – jdu.sg Aug 13 '10 at 15:19
  • 2
    What is the title of the book? Throw the book away. Get a [good one](http://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list). – GManNickG Aug 13 '10 at 15:31
  • I have got the answer. I tried this in VC++2005: void x(){ int X::*ptr=0; printf("Offsets of a=%d, b=%d, c=%d,ptr=%,&X::a,&X::b,&X::c,ptr); } It prints in VC++ 2005: Offsets of a=0, b=4, c=8,ptr=-1 And author is saying it should be: Offsets of a=1, b=5, c=9,ptr=0 So author is correct. VC++ 2005 has devised its way to solve the problem in excerpt. – theneuronarc Aug 13 '10 at 15:32
  • 1
    @theneuronarc: Don't be so quick to call it correct, I really doubt it is. You're looking for evidence to support what you want to believe rather than looking at evidence to find what to belief. – GManNickG Aug 13 '10 at 15:33
  • @GMan: PLease understand the problem first. – theneuronarc Aug 13 '10 at 15:34
  • @theneuronarc: I get it just fine. Seriously, on any modern architecture a compiler would be *broken* if the layout of members of a class where never on a power of two. The book is wrong, there's none of this "bump by one to distinguish" crap in the standard. It's a null pointer, which has an implementation-defined representation which *happens* to be -1 when you interpret it as a signed integer, which *happens* to work. None of this is standard, an implementation is free to do whatever they want, but to claim it's a guaranteed thing is wrong. – GManNickG Aug 13 '10 at 15:37
  • @GMan: The book you are asking me to throw is mentioned in your great c++ book list. Please read properly first. thanks. – theneuronarc Aug 13 '10 at 15:38
  • 1
    @theneuronarc: It's wrong, then, when it says "this happens"; it should say "this might happen". The book is assuming an implementation detail, where C++ has no such things. – GManNickG Aug 13 '10 at 15:40
  • Looking at reviews of the book, it seems it's full of the authors own experiences rather than objective information, and that it fails to distinguish between implementation-defined behavior and standard-compliant behavior. Perhaps we should remove it from the list. – GManNickG Aug 13 '10 at 16:04
  • 4
    The book appears to be "Inside the C++ Object model", the author led the early cfront C++ implementation teams, and it hasn't been updated since '96. Like many books, it's probably more a historical curiosity describing a specific implementation, not that relevant to today's C++. – paxdiablo Aug 13 '10 at 16:13

6 Answers6

11

The offset of something is how many units it is from the start. The first thing is at the start so its offset is zero.

Think in terms of your structure being at memory location 100:

100: class X { int a;
104:           int b;
108:           int c;

As you can see, the address of a is the same as the address of the entire structure, so its offset (what you have to add to the structure address to get the item address) is 0.

Note that the ISO standard doesn't specify where the items are laid out in memory. Padding bytes to create correct alignment are certainly possible. In a hypothetical environment where ints were only two bytes but their required alignment was 256 bytes, they wouldn't be at 0, 2 and 4 but rather at 0, 256 and 512.


And, if that book you're taking the excerpt from is really Inside the C++ Object Model, it's getting a little long in the tooth.

The fact that it's from '96 and discusses the internals underneath C++ (waxing lyrical about how good it is to know where the vptr is, missing the whole point that that's working at the wrong abstraction level and you should never care) dates it quite a bit. In fact, the introduction even states "Explains the basic implementation of the object-oriented features ..." (my italics).

And the fact that nobody can find anything in the ISO standard saying this behaviour is required, along the fact that neither MSVC not gcc act that way, leads me to believe that, even if this was true of one particular implementation far in the past, it's not true (or required to be true) of all.

The author apparently led the cfront 2.1 and 3 teams and, while this books seems of historical interest, I don't think it's relevant to the modern C++ language (and implementation), at least those bits I've read.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
8

Firstly, the internal representation of values of a pointer to a data member type is an implementation detail. It can be done in many different ways. You came across a description of one possible implementation, where the pointer contains the offset of the member plus 1. It is rather obvious where that "plus 1" come from: that specific implementation wants to reserve the physical zero value (0x0) for null pointer, so the offset of the first data member (which could easily be 0) has to be transformed to something else to make it different from a null pointer. Adding 1 to all such pointers solves the problem.

However, it should be noted that this is a rather cumbersome approach (i.e. the compiler always has to subtract 1 from the physical value before performing access). That implementation was apparently trying very hard to make sure that all null-pointers are represented by a physical zero-bit pattern. To tell the truth, I haven't encountered implementations that follow this approach in practice these days.

Today, most popular implementations (like GCC or MSVC++) use just the plain offset (not adding anything to it) as the internal representation of the pointer to a data member. The physical zero will, of course, no longer work for representing null pointers, so they use some other physical value to represent null pointers, like 0xFFFF... (this is what GCC and MSVC++ use).

Secondly, I don't understand what you were trying to say with your p1 and p2 example. You are absolutely wrong to assume that the pointers will contain the same value. They won't.

If we follow the approach described in your post ("offset + 1"), then p1 will receive the physical value of null pointer (apparently a physical 0x0), while the p2 whill receive physical value of 0x1 (assuming x has offset 0). 0x0 and 0x1 are two different values.

If we follow the approach used by modern GCC and MSVC++ compilers, then p1 will receive the physical value of 0xFFFF.... (null pointer), while p2 will be assigned a physical 0x0. 0xFFFF... and 0x0 are again different values.

P.S. I just realized that the p1 and p2 example is actually not yours, but a quote from a book. Well, the book, once again, is describing the same problem I mentioned above - the conflict of 0 offset with 0x0 representation for null pointer, and offers one possible viable approach to solving that conflict. But, once again, there are alternative ways to do it, and many compilers today use completely different approaches.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • MSVC actually has three or four different pointer to member representations depending on the inheritance model assumed for forward declared types. – MSN Aug 13 '10 at 16:32
  • 2
    @MSN: That usually applies to pointers to member *functions*. Pointers to *data* members are notably simpler. They are significantly less sensitive to the inheritance model (or not sensitive at all). Normally, one can implement them as plain offset in *any* inheritance model. If MSVC++ is doing something more complicated, I don't know the reason for that. – AnT stands with Russia Aug 13 '10 at 16:35
  • @AndreyT: You have observed the right problem. This problem is not about the alignment issues. Its about differentiating null pointer to data member to that of initilized ones. Thanks. – theneuronarc Aug 13 '10 at 17:27
  • @AndreyT, you are forgetting pointer to members of virtual base classes. That one is also less forgiving. – MSN Aug 24 '10 at 17:16
  • @MSN: No, I'm not forgetting anything. The issue with pointers to member *functions* is that in general case the non-trivial calculation of the proper `this` pointer value has to performed at the moment of *dereference*. This is why pointers to member *functions* have to carry quite a bit of extra information with them. This is why they are so complicated. – AnT stands with Russia Aug 24 '10 at 18:03
  • Pointers to *data* members are much simpler and all `this`-related calculations can be performed to the at the moment of conversion (up or down hierarchy). This is why pointers to data members can be implemented as simple offsets in *all* cases, even if virtual base classes are involved. – AnT stands with Russia Aug 24 '10 at 18:04
  • @AndreyT, how exactly do you encode a pointer to member to a virtual base class in a single offset? (And yes, I was originally thinking of member function pointers, not member pointers.) – MSN Aug 25 '10 at 14:22
  • @AndreyT, never mind. I understand what you stated now; basically, you can apply the offset when dereferencing after determining which type it is relative to since that is available from the member pointer type itself. – MSN Aug 25 '10 at 14:24
3

The behavior you're getting looks quite reasonable to me. What sounds wrong is what you read.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • 2
    Not to mention that having member variables on an uneven address would be quite inefficient. – humbagumba Aug 13 '10 at 14:54
  • have added excerpt from the book. Please have a look – theneuronarc Aug 13 '10 at 15:10
  • 3
    I've looked. I still think what I said above is pretty accurate -- at best, he's describing a method used by some particular compiler, not a general requirement. Offhand, I'm not sure I've ever seen a compiler that worked that way, but even if it did, I don't see much relevance. – Jerry Coffin Aug 13 '10 at 15:21
2

To complement AndreyT's answer: Try running this code on your compiler.

void test()
{  
    using namespace std;

    int X::* pm = NULL;
    cout << "NULL pointer to member: "
        << " value = " << pm 
        << ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;

    pm = &X::a;
    cout << "pointer to member a: "
        << " value = " << pm 
        << ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;

    pm = &X::b;
    cout << "pointer to member b: "
        << " value = " << pm 
        << ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;
}

On Visual Studio 2008 I get:

NULL pointer to member:  value = 0, raw byte value = 0xffffffff
pointer to member a:  value = 1, raw byte value = 0x0
pointer to member b:  value = 1, raw byte value = 0x4

So indeed, this particular compiler is using a special bit pattern to represent a NULL pointer and thus leaving an 0x0 bit pattern as representing a pointer to the first member of an object.

This also means that wherever the compiler generates code to translate such a pointer to an integer or a boolean, it must be taking care to look for that special bit pattern. Thus something like if(pm) or the conversion performed by the << stream operator is actually written by the compiler as a test against the 0xffffffff bit pattern (instead of how we typically like to think of pointer tests being a raw test against address 0x0).

TheUndeadFish
  • 8,058
  • 1
  • 23
  • 17
1

I have read that address of pointer to data member in C++ is the offset of data member plus 1?

I have never heard that, and your own empirical evidence shows it's not the case. I think you misunderstood an odd property of structs & class in C++. If they are completely empty, they nevertheless have a size of 1 (so that each element of an array of them has a unique address)

James Curran
  • 101,701
  • 37
  • 181
  • 258
1

$9.2/12 is interesting

Nonstatic data members of a (non-union) class declared without an intervening access-specifier are allocated so that later members have higher addresses within a class object. The order of allocation of nonstatic data members separated by an access-specifier is unspecified (11.1). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions (10.3) and virtual base classes (10.1).

This explains that such behavior is implementation defined. However the fact that 'a', 'b' and 'c' are at increasing addresses is in accordance with the Standard.

Chubsdad
  • 24,777
  • 4
  • 73
  • 129