How does the virtual keyword affect memory locations?

Question

I had a job interview earlier and was asked what the output of the following code is:

struct A {
    int data[2];
    A(int x, int y) { data[0] = x; data[1] = y; }

    virtual void f() {}
};

int main(){
    A a(22, 33);
    int* data = (int*)&a;
    cout << data[2] << endl;
}

I talked through it but couldn't figure it out exactly. He mentioned that the virtual function was a hint. Afterwards I compiled it and got the output:

I then thought about the virtual function and removed it:

struct A {
    int data[2];
    A(int x, int y) { data[0] = x; data[1] = y; }

    //virtual void f() {}
};

int main(){
    A a(22, 33);
    int* data = (int*)&a;
    cout << data[2] << endl;
}

Resulting in the ouput:

My question is: How does the seemingly inconsequential virtual function call affect the resulting object's memory layout to cause this?

`data[2]` is Undefined Behaviour. No need to say more than that. To answer question - object will have a vtable — Richard Critten, Jan 26 '21 at 01:19
Your interviewer was relying on the behavior of their chosen compiler. The interviewer may not have realized that. The C++ language is clear in calling that Undefined Behavior. That said, [possible dupe](https://stackoverflow.com/questions/3554909/what-is-vtable-in-c). — Drew Dormann, Jan 26 '21 at 01:19
The behaviour is undefined in both cases. If the interviewer expected specific output - presumably consistent with a particular compiler that company uses - then you have an interviewer who is a hacker, who writes code specific to one compiler, without understanding the significance of undefined behaviour. — Peter, Jan 26 '21 at 01:41
Perhaps the job you applied for had something to do with reverse engineering or hacking? Richard Critten and Drew Dorman are both right from a "legal" standpoint. "This is undefined behavior." is a correct answer. However plenty of jobs have been created for those who seek to define this "undefined behavior", and speaking from that standpoint, the behavior you observed is going to be nearly universal across all modern compilers. As others have noted, you've just discovered vtables. Try printing data[0] or data[1] to see a bit more. — Roguebantha, Jan 26 '21 at 01:43
@Roguebantha - There is actually an assumption that the size of the vtable (or representation of it, within the class type) is twice the size of an `int`. That is not a valid universal assumption. If the question had asked for a *possible* explanation of some observed behaviour (which will be specific to a particular implementation) that might be appropriate for a reverse-engineering job. If particular output is expected in both cases, that indicates a lack of understanding. — Peter, Jan 26 '21 at 01:48
You are 100% correct - it's not a universal assumption, and I took great care to make sure I noted it was a "nearly" universal assumption. Of course, there is nothing in the C++ standard dictating the size of a vtable and indeed on (now somewhat archaic in comparison) 32-bit platforms, you'll discover that a vtable will be the exact same size as an int. That being said, you could assume that it will be twice the size of an int on 64-bit platforms and be correct approximately 100% of the time. — Roguebantha, Jan 27 '21 at 05:27

Remy Lebeau · Accepted Answer · 2021-01-26T05:55:02.640

Adding 1 or more virtual methods to a class causes (in this case ¹) an object instance of that class to contain a hidden pointer to a compiler-managed "virtual method table" (vtable) at the front of the object's memory, eg:

int *data = &a;     A a;
   data -> --------------------
           | vtable           | -> [0]: &A::f
           | (8 bytes)        |
           |------------------|
data[2] -> | data[0]: 22      |
           |------------------|
           | data[1]: 33      |
           --------------------

Assuming sizeof(int) is 4 (which is usually the case), being able to access the object's data[0] member via an int* pointer to the object, where that pointer is indexing the 3rd int, tells us that there is an extra 8 bytes present at the front of the object, which can be accounted for by the vtable pointer, if the code is being compiling as 64bit (if it were compiled as 32bit instead, the vtable pointer would be 4 bytes, and data[2] would be accessing A::data[1] = 33).

Without any virtual methods, there is no vtable present, so the extra 8 bytes are not present, and thus indexing to the 3rd int from the front of the object will exceed past the bounds of the object into surrounding memory, eg:

int *data = &a;     A a;
   data -> --------------------
           | data[0]: 22      |
           |------------------|
           | data[1]: 33      |
           --------------------
data[2] ->

^{1: this is an implementation detail of the compiler. The C++ standard doesn't dictate how virtual methods are to be implemented. Most compilers will use a vtable, though.}

All the answers were great, but this one really helped me understand, thanks. At one point I was asked if I knew how virtual methods know to call their correct, overriden versions. I'm now realizing he was testing my knowledge of vtables, which I didn't know about before. Thanks for helping me figure that out! — Aroic, Jan 26 '21 at 04:16

score 2 · Answer 2 · answered Jan 26 '21 at 01:41

How does the virtual keyword affect memory locations?

Because the class doesn't have any other virtual functions, adding a virtual function adds a pointer to the class. The pointer points to a vtable, unique to that class.

The pointer might be the size of an int, but it might not. On a 64 bit system, it is typically twice the size of an int. But it doesn't have to be.

The pointer might be at the beginning of the class's memory layout, but it might not. The beginning is a common location, but popular commercial C++ compilers will sometimes place it elsewhere.

I had a job interview earlier and was asked what the output of the following code is

It is Undefined Behavior -- meaning that it would be dangerous to make any claim as to what the program might do.

Specifically, this code:

int* data = (int*)&a;
cout << data[2] << endl;

claims that a pointer to A is a pointer to int. This is not true. An A is not an int.

Since it is Undefined Behavior, you are given no assurances that the observed behavior from one run will match the observed behavior from any other run. Because of this, UB is to be avoided.

How does the virtual keyword affect memory locations?

2 Answers2