helgrind reports possible race between use of singleton and its constructor

Question

As mentioned there, Meyer's singleton is thread-safe in C++11.

So I would expect this code to be fine:

#include <stdio.h>
#include <pthread.h>

struct key_type {
    int value;
    key_type() : value(0) { }
};  


void * thread1(void*) {
    static key_type local_key;
    printf("thread has key %d\n", local_key.value);
    return NULL;
}   

int main()
{
    pthread_t t[2];
    pthread_create(&t[0], NULL, thread1, NULL);
    pthread_create(&t[1], NULL, thread1, NULL);
    pthread_join(t[0], NULL);
    pthread_join(t[1], NULL);
}

(code overly simplified on purpose, I know I can do zero-initialization trivially.)

I'm compiling with g++-7.1.0. Helgrind (valgrind-3.12.0) reports a possible data race between the read of local_key.value and the ctor, which sets value.

==29036== Possible data race during read of size 4 at 0x601058 by thread #3
==29036== Locks held: none
==29036==    at 0x4006EA: thread1(void*) (datarace-simplest.cpp:12)
==29036==    by 0x4C32D06: mythread_wrapper (hg_intercepts.c:389)
==29036==    by 0x4E45493: start_thread (pthread_create.c:333)
==29036==    by 0x59DEAFE: clone (clone.S:97)
==29036== 
==29036== This conflicts with a previous write of size 4 by thread #2
==29036== Locks held: none
==29036==    at 0x400780: key_type::key_type() (datarace-simplest.cpp:6)
==29036==    by 0x4006DF: thread1(void*) (datarace-simplest.cpp:11)
==29036==    by 0x4C32D06: mythread_wrapper (hg_intercepts.c:389)
==29036==    by 0x4E45493: start_thread (pthread_create.c:333)
==29036==    by 0x59DEAFE: clone (clone.S:97)
==29036==  Address 0x601058 is 0 bytes inside data symbol "_ZZ7thread1PvE9local_key"

I thought that the c++11 standard (§6.7) guaranteed that local_key was initialized just once and for all, so that further accesses dealt with variables whose ctor was guaranteed not to be still running.

Otherwise such a variable is initialized the first time control passes through its declaration; such a variable is considered initialized upon the completion of its initialization. [...] If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization. [...]

Am I wrong ? Is it a helgrind defect ? Is this use case known to slip through the cracks so that helgrind reports a possible race ?

If you get the assembler around the initialisation does it look as if locks are being used? — Richard Critten, Jun 22 '17 at 12:02
There surely isn't any lock around the read `local_key.value`. As for the initialization, yes there is, it's bracketed by `__cxa_guard_acquire` and `__cxa_guard_release` — EThome, Jun 22 '17 at 12:12

score 0 · Answer 1 · answered Jun 22 '17 at 12:53

I think this is a hellgrind defect. The standard guarantees that the static initialization is sequenced before the read later, and evidence (see below) suggests that not only the decision whether or not to run the constructor, but actually the whole constructor is behind a lock.

Modifying your example such that the constructor reads

key_type() : value(0) {
    sleep (1);
    pthread_yield();
    value = 42;
}

outputs 42 twice. This suggests that the lock acquired when testing if necessary and starting with the initialization is not released until the constructor finishes.

score 0 · Accepted Answer · answered Jun 22 '17 at 20:14

Disassembling the function thread1, I see calls to __cxa_guard_acquire and __cxa_guard_release, which I think we can reasonably assume is what protects the constructor. However, such calls are not intercepted by helgrind, and so, helgrind does not observe any synchronisation. This is a bug/weakness in Valgrind/helgrind worth filing a bug on valgrind bugzilla. Note however that quickly reading the code, the calls to __cxa_guard_acquire and __cxa_guard_release seems to not match directly a pair of lock/unlock: it looks like the code might just call acquire, and then not call release:

   00x000000000040077e <+24>:   mov    $0x600d00,%edi
   0x0000000000400783 <+29>:    callq  0x400610 <__cxa_guard_acquire@plt>
   0x0000000000400788 <+34>:    test   %eax,%eax
   0x000000000040078a <+36>:    setne  %al
   0x000000000040078d <+39>:    test   %al,%al
   0x000000000040078f <+41>:    je     0x4007a5 <thread1(void*)+63>
   0x0000000000400791 <+43>:    mov    $0x600d08,%edi
   0x0000000000400796 <+48>:    callq  0x40082e <key_type::key_type()>
   0x000000000040079b <+53>:    mov    $0x600d00,%edi
   0x00000000004007a0 <+58>:    callq  0x400650 <__cxa_guard_release@plt>
   0x00000000004007a5 <+63>:    mov    0x20055d(%rip),%eax        # 0x600d08 <_ZZ7thread1PvE9local_key>

After debugging a little bit, it looks like the guard is located just before local_key, and becomes set to 1 after the object is constructed. Not very clear to me what __cxa_guard_release has to do. Some more reading of the c++ runtime library code is needed I guess to understand how helgrind could (maybe) be instructed about what happens there.

Note also that the valgrind drd tool similarly suffers from the same bug/weakness.

I think you're right. There's doc on `__cxa_guard_acquire` and `__cxa_guard_release` [there](https://libcxxabi.llvm.org/spec.html). The code skips `_release` if `_acquire` tells it that initialization's already done. It could be that this is a typical situation where helgrind does its best, but just cannot tell that there's no race. I'll ponder this and maybe report. — EThome, Jun 22 '17 at 21:58
See also [this thread](https://sourceforge.net/p/valgrind/mailman/message/32434015/) on `valgrind-users`. — EThome, Sep 28 '17 at 07:42

score 0 · Answer 3 · answered Jul 21 '22 at 08:29

In addition to the existing answers, if you look at the full assembly produced, you see there's an additional check on the guard variable before the lock. A write of this (under lock) coinciding with a read of this (the "preview" outside the lock) is probably seen as a data race.

See here: https://godbolt.org/z/3bf66qr4G

getInstance():
    pushq   %rbp
    movq    %rsp, %rbp


    # this is a "preview", probably seen as a data race:
    movzbl  guard variable for getInstance()::inst(%rip), %eax
    testb   %al, %al
    sete    %al
    testb   %al, %al
    je      .L4


    # preview says "does not exist"
    # check again whether instance exists, under lock
    movl    $guard variable for getInstance()::inst, %edi
    call    __cxa_guard_acquire
    testl   %eax, %eax
    setne   %al
    testb   %al, %al
    je      .L4


    # instance does not exist - create one
    movl    $_ZZ11getInstancevE4inst, %edi

    call    Foo::Foo() [complete object constructor]

    movl    $__dso_handle, %edx
    movl    $_ZZ11getInstancevE4inst, %esi
    movl    $_ZN3FooD1Ev, %edi
    call    __cxa_atexit

    movl    $guard variable for getInstance()::inst, %edi
    call    __cxa_guard_release


.L4:
    movl    $_ZZ11getInstancevE4inst, %eax
    popq    %rbp
    ret

helgrind reports possible race between use of singleton and its constructor

3 Answers3