3

I'm using VS2012 with default optimization settings (/O2), and this problem exists only in release mode.

I have some code that uses a michael_deque (with the standard GC) and a pointer to (abstract) type T.
When I try to push back a pointer to a type that is derived from T, the application crashes while exiting the push_back() function of michael_deque.

The problem seems to depend precisely on this specific type T, because writing a dummy class foo, deriving from it in class bar (and printing something in the constructor to avoid it being optimized away) and then pushing back new bar() to michael_deque does not cause a crash.

The classT in question is this:

class Task
{
public:
    Task() : started(false), unfinishedTasks(1), taskID(++taskIDCounter) {};
    Task(unsigned int parentID_) : started(false), unfinishedTasks(1), taskID(++taskIDCounter), parentID(parentID_)/*, taken(0)*/ {};
    virtual ~Task() = 0 {};

    virtual void execute() final
    {
        this->doActualWork();
        unfinishedTasks--;
    }

    virtual void doActualWork() = 0;
public:
    unsigned int taskID;    //ID of this task
    unsigned int parentID;  //ID of the parent of this task
    bool started;
    std::atomic<unsigned int> unfinishedTasks; //Number of child tasks that are still unfinished
    std::vector<unsigned int> dependencies; //list of IDs of all tasks that this task depends on

};

The error can be reproduced in a minimal program (if you happen to have an environment that can execute this in the same manner as I do, just put an std::atomic<unsigned int> taskIDCounter somewhere where the Task class can see it) :

#include <cds/container/michael_deque.h>
#include "task.hpp"
class a : public Task
{
a()
    {
        std::cout<<"dummy print"<<std::endl;
    }
    virtual ~a()
    {
    }

    virtual void doActualWork()
    {
        std::cout<<"whatever"<<std::endl;
    }
};



int main()
{
    cds::Initialize();

    {
        cds::gc::HP hpGC;
        cds::gc::HP::thread_gc myThreadGC;

        cds::container::MichaelDeque<cds::gc::HP,Task*> tasks;
        tasks.push_back(new a()); //will crash at the end of push_back
    }
        cds::Terminate();
}

What could be the cause of this? Do I do something undefined in the class Task which causes problems in the optimization problems?

sehe
  • 374,641
  • 47
  • 450
  • 633
TravisG
  • 2,373
  • 2
  • 30
  • 47
  • Aren't you able to debug it in VS? – Uchia Itachi Aug 31 '13 at 17:54
  • The error doesn't happen in debug mode. In release mode it's very hard to debug (even with debug symbols activated), since most of the stuff in that function is optimized away. – TravisG Aug 31 '13 at 17:56
  • Hard luck, a potential for a crash is when we are de-referencing a dangling pointer. But I don't think it is caused by push_back. – Uchia Itachi Aug 31 '13 at 17:59
  • It is. When I halt the program after it crashes, the app is currently exiting the push_back function (likely deconstruction something after it goes out of scope). It's not showing what it is doing, though. – TravisG Aug 31 '13 at 18:06
  • You're creating `tasks` container in a local scope with the extra pair of braces `{ } `. It might be that the `cds:Terminate()` is trying to free something which is already freed. – Uchia Itachi Aug 31 '13 at 18:08
  • Do you really need to limit the scope inside main for the container? – Uchia Itachi Aug 31 '13 at 18:12
  • The cds library basic usage example does it. Maybe hpGC / myThreadGC need to be destroyed by the time cds::terminate() is called. Either way, I found that if I plant the "Task" class in the same compilation unit as as the place where I call push_back, it doesn't crash. By now I'm somewhat assuming that this is a compiler bug. It doesn't really make sense though, T* and pointers of a derived type are the same size, and the library shouldn't be doing anything but assigning pointers.. – TravisG Aug 31 '13 at 18:15
  • @Uchia Itachi: Yeah, it's a compiler bug. If I use #pragma optimize("",off) before every function in the task class, the bug disappears. Well, the class mostly does nothing except initialize stuff, so that's not too bad. – TravisG Aug 31 '13 at 19:28
  • @TravisG that doesn't make it a compiler bug. That merely spells [Undefined Behaviour](http://en.wikipedia.org/wiki/Undefined_behavior). 99% of the time UB is caused by your code, not library code. This is also confirmed by the fact that using a simple type as '`T`' doesn't cause a problem. – sehe Aug 31 '13 at 23:07
  • @sehe surely you can point out which part of the Task class causes UB, then. It's not a complex class, 100% of the source code is there. – TravisG Sep 01 '13 at 07:38
  • @TravisG I'm sorry. I didn't mean to offend you by my failure to debug your problem. Please bear in mind SO is operated by volunteers, in their available time. I was mainly pointing out the obvious because I feared the answer existing at the time and your own comments were going to prevent you from finding the real cause. – sehe Sep 01 '13 at 08:42
  • @sehe that's fine, no offense taken :) – TravisG Sep 01 '13 at 08:47
  • @TravisG I've looked at it and compiled it on my linux box (gcc 4.7.2). I see no real problems, allthough I'd worry about the initialization order in the constructors, `parentID` being uninitialized in one constructor (!) and the (nonstandard?) `= 0 {}` style on `~Task`. I'd [drop the `= 0` there and reorder initialization to match exactly your desired order. And make sure to (default-)initialize `parentID` in all paths. Make sure Tasks are not copyable/assignable, properly destruct the `A*` instance, and fix accessibility of members for the classes `Task` and `A`](http://ideone.com/UrmlpR). – sehe Sep 01 '13 at 11:14
  • @sehe, see my answer. It looks like libstdc++ has non-buggy std::atomic implementations :P = 0 {} is not nonstandard, it's required per standard (see http://stackoverflow.com/questions/4998344/why-must-a-pure-virtual-destructors-implementation-be-empty-and-should-it-be-i ) – TravisG Sep 01 '13 at 11:16
  • Mmm. `= 0 {}` is not accepted by gcc 4.7, 4.8 (an out-of-line definition is accepted, though). Clang accepts it just fine indeed. – sehe Sep 01 '13 at 11:27
  • Well, there is a non-standard part in there that I introduced when I copied the implementations out of the .cpp file. The body has to be defined outside the initial definition of the class. – TravisG Sep 01 '13 at 12:02

1 Answers1

5

It was indeed a compiler bug. More specifically, it was a bug associated with Visual Studio 2012's atomic implementation.

Some template specializations of the std::atomic class modify the stack frame pointer (ebp) without backing it up on and popping it form the stack before/after the modification. The libcds library uses one of these specializations, and the resulting incorrect frame pointer sometimes causes illegal memory access (the undefined behavior seems to prevent catastrophic failure in debug mode) when running outside the scope of a function.

The fix in this special case was to make libcds use a different atomics library then the standard one provided by Visual Studio. The library decides which implementation to use in cxx11_atomic.h:

#if defined(CDS_USE_BOOST_ATOMIC)
#   error "Boost.atomic is not supported"
//#   include <boost/version.hpp>
//#   if BOOST_VERSION >= 105300
//#       include <boost/atomic.hpp>
//#       define CDS_ATOMIC boost
//#       define CDS_CXX11_ATOMIC_BEGIN_NAMESPACE namespace boost {
//#       define CDS_CXX11_ATOMIC_END_NAMESPACE }
//#   else
//#       error "Boost version 1.53 or above is needed for boost.atomic"
//#   endif

#elif CDS_CXX11_ATOMIC_SUPPORT == 1
    // Compiler supports C++11 atomic (conditionally defined in cds/details/defs.h)
#   include <cds/compiler/cxx11_atomic_prepatches.h>
#   include <atomic>
#   define CDS_ATOMIC std
#   define CDS_CXX11_ATOMIC_BEGIN_NAMESPACE namespace std {
#   define CDS_CXX11_ATOMIC_END_NAMESPACE }
#   include <cds/compiler/cxx11_atomic_patches.h>
#else
#   include <cds/compiler/cxx11_atomic.h>
#   define CDS_ATOMIC cds::cxx11_atomics
#   define CDS_CXX11_ATOMIC_BEGIN_NAMESPACE namespace cds { namespace cxx11_atomics {
#   define CDS_CXX11_ATOMIC_END_NAMESPACE }}
#endif

The second branch of the if statement can be changed to something like

#elif CDS_CXX11_ATOMIC_SUPPORT == 255

which will cause the library to use its own atomics implementation at all times.

TravisG
  • 2,373
  • 2
  • 30
  • 47
  • 1
    Oh. Ahahaha. Great timing. I just finished my analysis on linux :/ +1 – sehe Sep 01 '13 at 11:20
  • 1
    This is a painful bug. For the record, the bug has been reported against VS2013 (!) and as such will only be fixed with VS2013 RTM – sehe Sep 01 '13 at 11:20