9

I have a class representing a finite-state machine, which should run in a forever loop and check it's current state. In each state machine will set it's next state and either fall into idle state or do some work. I would like to allow another thread to change state of machine while it's working. This will cause a race condition as expected. So I add a mutual exclusion lock/unlock wrapping loop of machine and the public method that allows other threads to change current state of machine.

class Robot
{
public:
    enum StateType {s1,s2,s3,idle,finish};
    void run();
    void move();
private:
    StateType currentState;
    StateType nextState;
    StateType previousState;
    std::mutex mutal_state;
};

Implementation:

void Robot::run()
{
    this->currentState = s1;
    while(true)
    {
        mutal_state.lock();
        switch(currentState)
        {
        case s1:
            // do some useful stuff here...
            currentState = idle;
            nextState = s3;
            break;
        case s2:
            // do some other useful stuff here...
            currentState = idle;
            nextState = finish;
            break;
        case s3:
            // again, do some useful things...
            currentState = idle;
            nextState = s2;
            break;
        case idle:
            // busy waiting...
            std::cout << "I'm waiting" << std::endl;
            break;
        case finish:
            std::cout << "Bye" << std::endl;
            mutal_state.unlock();
            return;
        }
        mutal_state.unlock();
    }
}

And the move method that allows other threads to change current state:

void Robot::move()
{
    mutal_state.lock();
    previousState = currentState; // Booommm
    currentState = nextState;
    mutal_state.unlock();
}

I can't manage to find what I'm doing wrong! Program crashes in first line of the move() function. On the other hand, the GDB is not working with C++11 and tracing code is not possible...

UPDATE:

Playing around code, I can see that problem is in move function. When the program tries to lock code piece inside move(), crashes. For example if move is like this:

void Robot::move()
{
    std::cout << "MOVE IS CALLED" << std::endl;
    mutal_state.lock();
    //previousState = currentState;
    //std::cout << "MOVING" << std::endl;
    //currentState = nextState;
    mutal_state.unlock();
}

Output is:

s1
I'm waiting
I'm waiting
MOVE IS CALLED1
The program has unexpectedly finished.

But when move is a simple function, not doing anything:

void Robot::move()
{
    std::cout << "MOVE IS CALLED" << std::endl;
    //mutal_state.lock();
    //previousState = currentState;
    //std::cout << "MOVING" << std::endl;
    //currentState = nextState;
    //mutal_state.unlock();
}

Program runs concurrently.

James McNellis
  • 348,265
  • 75
  • 913
  • 977
sorush-r
  • 10,490
  • 17
  • 89
  • 173
  • Well have you tried debugging with print statements? – FrustratedWithFormsDesigner Mar 09 '12 at 19:49
  • @ FrustratedWithFormsDesigner: Yes. I know that the "explosion" only happens when some thread tries to call ``move``. – sorush-r Mar 09 '12 at 19:53
  • What version of what compiler are you using? – ildjarn Mar 09 '12 at 19:53
  • @ildjarn: I'm using gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) and thread model is posix – sorush-r Mar 09 '12 at 19:55
  • How is `previousState` declared? – Tudor Mar 09 '12 at 20:03
  • @Tudor: It's a private member of class. Just forgot to write it in question :-) – sorush-r Mar 09 '12 at 20:05
  • Is an exception being thrown? I so, can you catch it? Presumably, currentState or 'this' are not correctly accessible. Are you sure that the the other threads all have access to the same instance? – Martin James Mar 09 '12 at 20:33
  • 8
    You'd do well to start using lock guards (e.g. `std::lock_guard`, `std::unique_lock`) rather than manually locking and unlocking. This may not help with your immediate issue but can only help for maintenance/static verification. Not to mention exception safety. – Luc Danton Mar 09 '12 at 21:12
  • Are you sure the crash occurs exactly at that line? Is it a segfault? Maybe the crash occurs when the thread executes some branch of the switch. Did you try to see in which state the thread goes and which code is executed? – Tudor Mar 09 '12 at 21:14
  • @Tudor: Yes, I'm sure. I tried printing some outputs in `move` before and after calling `mutal_state.lock()`. only first one is printed and it's the last thing program says. – sorush-r Mar 10 '12 at 05:10
  • I am in the odd state of being grateful my answer was helpful and accepted, but unsure of which suggestion was the one that worked. Any chance of some detail on what fixed the issue? – Matt T Mar 10 '12 at 22:08
  • @MattT: First one! I traced code by hand and found that it's completely different issue. Ownership of an object is changed after spawning a thread on a member function of it. See my answer ;) – sorush-r Mar 12 '12 at 07:31

4 Answers4

2

My suggestions:

1) if you have no debugger, how can you be so sure it is the first line of move that crashes? It is always with questioning any assumptions you have made about the code, unless you have hard evidence to back it up.

2) I would look at whatever interesting code is in state s3, as this is what the first call to move will perform. Up to that point the code in s3 has not been run. Either that or remove all code bar what is in the posted example, to rule this out.

3) The compiler may make copies of the variables in registers, you should declare all the states as volatile so it knows not to optimise in this way.

Matt T
  • 607
  • 3
  • 10
2

I can not help you why your code "explodes", however I can assume that the problem is not in the code you posted as it runs fine for me.

This will output for me:

I'm working
...
Bye

Code:

int main() {

    Robot r;

    auto async_moves = [&] () {  // simulate some delayed interaction
        std::this_thread::sleep_for(std::chrono::seconds(2)); //See note
        for(auto i = 0; i != 3; ++i)
            r.move();

    };

    auto handle = std::async(std::launch::async, async_moves);

    r.run();

} 

(Note: You have to compile with -D_GLIBCXX_USE_NANOSLEEP assuming you are using gcc, see this question.)

Note that the code above - and yours maybe, too - is still vulnurable to the problem, that the states may get invalidated if move is called twice or more before the loop triggers again.
Like one of the comments already mentioned, prefer to use lock_guards:

std::lock_guard<std::mutex> lock(mutal_state);
Community
  • 1
  • 1
Stephan Dollberg
  • 32,985
  • 16
  • 81
  • 107
  • Trying `std::lock_guard` didn't help. However I can't run your code. It terminates: `terminate called after throwing an instance of 'std::system_error' what(): Operation not permitted'` – sorush-r Mar 10 '12 at 05:19
  • @sorush-r You have to add the -pthread compiler option. That will fix that error. – Stephan Dollberg Mar 10 '12 at 08:17
  • @sorush-r doesn't make sense to me as that is exactly the error I get without `-pthread`, I am compiling like this `g++ -std=c++0x -pthread -D_GLIBCXX_USE_NANOSLEEP -o main main.cpp` – Stephan Dollberg Mar 10 '12 at 13:29
  • Oops! sorry, I'm using Qt build system, and forgot to do a `qmake` after adding `-lpthread` to .pro file. Thanks for your answer :) – sorush-r Mar 10 '12 at 13:33
1

If you're using g++ on linux, you need to link with -lpthread in order for mutexes or threading stuff to work properly. If you don't, it won't fail to link, but will instead behave badly or crash at runtime...

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • It is probably better to use the -pthread option. See here: http://stackoverflow.com/q/2127797/893693. However, I assume he did that anyway. – Stephan Dollberg Mar 10 '12 at 00:07
  • I'm using both `-pthread` and `-lpthread` linker options. There is no problem with linking. – sorush-r Mar 10 '12 at 05:12
1

I'm answering my own question! Because I find the problem, and It was not related to locking nor mutex implementation of C++0x. There is an ImageProcess class that should control state of Robot. It has a pointer to it's parent of type Robot* and using that, will move its parent. For that I've implemented a workhorse and a starter function. The start spawns a std::tread and runs workhorse on it:

void ImageProcess::start()
{
    std::thread x(&ImageProcess::workhorse, *this);
    x.detach();
}

I realized that this->parent in workhorse is a dangling pointer. Obviously calling parent->move() should crash. But it don't crash immediately! Surprisingly program control enters into move() function and then tries to change previousState of a non-existing Robot thing. (or lock a mutex of non-existing Robot).

I found that when invoking a thread like std::thread x(&ImageProcess::workhorse, *this); x.join() or x.detach(), the code is no longer running in caller object. To test I printed address of this and &image in both Robot::run() and ImageProcess::workhorse. There were different. I also added a public boolean foo to ImageProcess and changed its value to true in Robot, then printed it in workhorse and run, in workhorse value is always 0 but in Robot is 1.

I believe this is very strange behavior. I don't know if it's related to memory model or ownership of ImageProcess somehow is changed after std::thread x(&ImageProcess::workhorse, *this)...

I make ImageProcess a factory pattern class (everything is static!). Now it's OK.

sorush-r
  • 10,490
  • 17
  • 89
  • 173