0

I had a hard time figuring out why my code didn't work. I tracked the problem down to the vtable generation/population.

Here is a simplified version of my code:

#include <iostream>
#include <functional>
#include <thread>

using namespace std;

typedef std::function<void (void)> MyCallback;

MyCallback clb = NULL;

void callCallbackFromThread(void){
    while(clb == NULL);
    clb();
}

int someLongTask(void){
    volatile unsigned int i = 0;
    cout << "Start someLongTask" << endl;
    while(i < 100000000)
        i++;
    cout << "End someLongTask" << endl;
    return i;
}

class Base{
public:
    Base(){
        clb = std::bind(&Base::_methodToCall, this);
        cout << "Base-Constructor" << endl;
    }
protected:
    virtual void methodToCall(void){
        cout << "Base methodToCall got called!" << endl;
    };
private:
    void _methodToCall(void){
        methodToCall();
    }
};

class Child: public Base{
public:
    Child()
     : dummy(someLongTask()){
        cout << "Child Constructor" << endl;
    }
protected:
    void methodToCall(void){
        cout << "Child methodToCall got called!" << endl;
    }
private:
    int dummy;
};

int main()
{
    thread t(callCallbackFromThread);
    Child child;
    t.join();
    clb();

    return 0;
}

When I run this, I sometimes get the result:

Base-Constructor
Start someLongTask
Child methodToCall got called!
End someLongTask
Child Constructor
Child methodToCall got called!

And sometimes this:

Base-ConstructorBase methodToCall got called!
Start someLongTask

End someLongTask
Child Constructor
Child methodToCall got called!

Can someone explain to me why the base-method is called in one case and the child-method in the other? It surely has something to do with the exact time the thread will be executed. To me, it seems, that the vtable isn't set up properly when the thread calls the callback, but I don't understand why.

Muperman
  • 344
  • 2
  • 14
  • 7
    You have a data race, which mean your code has undefined behavior, all results are valid. You have to have some sort of thread synchronization around your writing and reading of `clb`. – NathanOliver Aug 29 '22 at 13:07
  • 1
    Some notes: `while(clb == NULL);` leads to a race condition if you assign `clb` from another thread. `while(i < 100000000) i++;` may be completely optimized away. Use `thread::sleep_for`. – Ted Lyngmo Aug 29 '22 at 13:07
  • https://stackoverflow.com/questions/7934540/when-exactly-does-the-virtual-table-pointer-in-c-gets-set-for-an-object – Mat Aug 29 '22 at 13:08
  • 1
    Is there something in your C++ textbook's chapter on thread synchronization that was unclear to you? Shouldn't your textbook's explanation, on how to properly synchronize multiple execution threads, made it clear that this has no chance of working properly? The vtable corruption is the symptom of the real problem, and not the cause. – Sam Varshavchik Aug 29 '22 at 13:10
  • @TedLyngmo The `i` loop is not going to be optimized away (at least in practice) because `i` is marked `volatile`. – user17732522 Aug 29 '22 at 13:14
  • Side note: Don't use `NULL` in modern C++, use [nullptr](https://en.cppreference.com/w/cpp/language/nullptr). – Jesper Juhl Aug 29 '22 at 13:34
  • 1
    Although it may look like you're not [calling a virtual method from your constructor](https://stackoverflow.com/questions/962132/calling-virtual-functions-inside-constructors), giving away a std::function object (`clb`) whose implementation is to call that virtual method and then having a thread that calls that std::function is just handing the gun to another thread so that it can shoot you in your foot rather than you shooting yourself in the foot. It's the same race condition - the object may not be fully constructed. – Wyck Aug 29 '22 at 13:35

2 Answers2

6

The vtable is an implementation detail of the C++ language. You never have to care about it if you are following the language rules and not digging into undefined behavior territory to hack around within the specific language implementation.

Your code fails because you are not using any appropriate synchronization that is required for a multi-threaded program.

clb is not an atomic variable and as a consequence you are not allowed to store to it in one thread and load from it in another without using some synchronization primitive like e.g. a mutex guarding the read and write access.

Accessing a non-atomic object in multiple threads without synchronization with at least one of them being a read is known as a data race and is unconditionally undefined behavior in C++.


As pointed out by @Mat there is another race on the lifetime of the class object that the call on clb would use. A virtual function call during construction is only allowed directly or indirectly from the path through the constructor. This doesn't apply to the call in the thread and there is also no synchronization to assure that all constructors of the class object have finished before clb is called. Therefore again undefined behavior. (This is colloquially known as a data race on the vtable pointer, since that is why it fails in practice on implementations.)

user17732522
  • 53,019
  • 2
  • 56
  • 105
  • 3
    On top of that the clb() call itself might access the child object before its lifetime has started (constructor not done), which is also prob UB (and has a race on the vtable pointer itself too assuming an implementation that uses that) – Mat Aug 29 '22 at 13:18
  • @Mat Yes true. I added that as well. – user17732522 Aug 29 '22 at 13:33
  • Okay, thanks for the answer. I understand why the data race on the vtable will cause problems. But I don't see why a synchronisation for the clb-variable-access is needed. I always thought it is safe to write to a pointer variable from one thread and read the same from another. (Hence writing a pointer IS an atomic operation, at least in the embedded applications I am writing code for) – Muperman Aug 29 '22 at 13:50
  • 1
    @Muperman: pointer access is atomic on some implementations but not guaranteed in C++ in general, you shouldn't rely on it. And clb is not a pointer at all. `std::function` is not a thin wrapper for function pointers, it is a very complex machine. – Mat Aug 29 '22 at 13:52
  • @Mat Okay, thanks. I think I have to read more about std::function then.... – Muperman Aug 29 '22 at 13:55
  • 1
    @Muperman Even if the underlying hardware architecture operates atomically on the type of the variable, the compiler is still allowed to optimize based on the assumption that data races in the sense of the language memory model (which apply to all variables except for `std::atomic` specializations) don't happen. For example it is legitimate for the compiler to completely remove the `while(clb == NULL);` loop since it can assume no other thread modifies `clb` while it executes and then it is either immediately false or an infinite loop which would also cause undefined behavior. – user17732522 Aug 29 '22 at 13:58
  • @user17732522 Yeah I know, that. I just didn't care in my small example. The example should show my problem, not more. I would not use this in production code ;) – Muperman Aug 29 '22 at 14:02
  • 1
    @Muperman Yeah, be careful with that. Both current GCC and Clang at least do perform that optimization. GCC at `-O2` will throw you occasional `std::bad_function_call` exceptions. Clang and GCC at higher level even optimize the throw away and will probably occasionally cause null dereferences. – user17732522 Aug 29 '22 at 14:20
1

Data races aside, when a derived class object is constructed in C++, it starts as a Base object then transitions to a Child object. (If you have more than one level of inheritance, the object transitions more than once)

If you call a virtual function from the base class constructor, you will see that the base class implementation is called, because the object is still a base class object.

The behaviour you are seeing is exactly what you would expect if you called _methodToCall(); inside the base class constructor. In your code, the call happens on another thread while the base class constructor is still running, which comes with a zillion risks, but the thread is actually irrelevant - you could simply write _methodToCall(); or clb(); inside the base class constructor and observe similar behaviour.

user253751
  • 57,427
  • 7
  • 48
  • 90
  • The other answer also addresses the problem, but yours is exactly what I wanted to know, so I accepted this now. Thanks. – Muperman Aug 30 '22 at 05:05