Why does this kind of code crash (very occasionally!) on x64?
class Object
{
public:
void manipulate(); // reads and writes object
bool m_finished = false; // note this is a regular bool and NOT atomic
}
Thread 1
object->manipulate();
object->m_finished = true;
// no further access of object
Thread 2
if (object->m_finished)
delete object;
In two separate applications I saw crashes due to this pattern (this is not the exact code, just an example of a common pattern).
The crash takes the form of heap corruption where it looks very much like the the memory occupied by 'object' was modified after it was freed. It's as if Thread 1 did:
object->m_finished = true;
object->manipulate()
and the object got deleted between the setting of m_finished and manipulate().
In the first app, the fix was to put a mutex around both setting and testing of m_finished (actually it already had a mutex around the setting part). It involved an OS bug in aio_error and is described in this question I asked
The second time I saw this, it was fixed by changing m_finished to a std::atomic.
In both cases it was an extremely rare heisencrash that would disappear with any attempt to use valgrind, ASAN etc. My assumption is that it was due to the CPUs re-ordering the writes in object->manipulate()
and delete object
.
Note that I am 95% sure there was no compiler level re-ordering going on, as the various methods involved were virtual functions etc.
I get that it's good to use atomics when doing inter-thread signalling (and I promise I will forever more). As far as I understand x86 (Intel and AMD, we run AMD) does not allow for store-store re-ordering. So if the deleting thread witnesses m_finished == true
, all the preceding stores performed in object->manipulate()
should be committed to memory right?
I understand that in general the behaviour of this code is undefined. But specifically why, on x64, does it appear that use of a mutex or atomic for m_finished is required when the cpu specs guarantee ordering of stores and loads?