Is this implementation of Double checked lock pattern (DCLP) in C++11 is correct?

Question

I am reading about DCLP (double-checked lock pattern), and I am not sure I got it right. When using atomics to create the lock (as explained in DCLP fixed in C++11), and there are 2 things that are not clear:

In the code from the article:

std::atomic<Singleton*> Singleton::m_instance;
std::mutex Singleton::m_mutex;

Singleton* Singleton::getInstance() {
    Singleton* tmp = m_instance.load(std::memory_order_acquire);
    if (tmp == nullptr) {
        std::lock_guard<std::mutex> lock(m_mutex);
        tmp = m_instance.load(std::memory_order_relaxed);
        if (tmp == nullptr) {
            tmp = new Singleton;
            m_instance.store(tmp, std::memory_order_release);
        }
    }
    return tmp;
}

What happens if I aquire the fence inside "load()", but than tmp is not nullptr, and I simply return? Shouldn't we state where the CPU can "release the fence"?

And if it is not required to release the fence, that why do we have acquire and release? what is the difference?

Surly I am missing something basic....

If I got the article correctly, is that a correct way to implement DCLP as well?

Singleton* Singleton::m_instance = null;
std::atomic<bool> Singleton::is_first; // init to false
std::mutex Singleton::m_mutex;

Singleton* Singleton::getInstance() {
    bool tmp = is_first.load(std::memory_order_acquire);
    if (tmp == false) {
        std::lock_guard<std::mutex> lock(m_mutex);
        tmp = is_first.load(std::memory_order_relaxed);
        if (tmp == false) {
            // can place any code that will run exactly once!
            m_instance = new Singleton;

            // store back the tmp atomically
            is_first.store(tmp, std::memory_order_release);
        }
    }
    return m_instance;
}

In other words, instead of looking at the instance I am using an atomic boolean to make sure the DCLP works, and whatever is inside the second tmp is surly to be syncronized and run once. Is it correct?

Thanks!

EDIT: Notice I am not asking the question to implement singleton, but simply to understand better the concepts of fences and atomic and how it fixed DCLP. It is a theoretical question.

You return a boolean instead of `Singleton*` in your modified version... — Jarod42, Jun 02 '15 at 09:56
You don't "acquire a fence" or "release a fence". Fences have acquire or release semantics. — Jonathan Wakely, Jun 02 '15 at 10:12

Jonathan Wakely · Accepted Answer · 2015-06-02T10:43:32.250

What happens if I aquire the fence inside "load()", but than tmp is not nullptr, and I simply return? Shouldn't we state where the CPU can "release the fence"?

No. The release is done when the store into m_instance happens. If you load m_instance and it is not null, then the release already happened earlier and you don't need to do it.

You do not "acquire a fence" and "release a fence" like you acquire a mutex lock. That's not what fences are. A fence is simply an acquire or release operation without an associated memory location. And fences aren't really relevant here, because all the acquire and release operations have an associated memory location (the atomic object m_instance).

You do not have to have acquire+release in matched pairs like mutex locks+unlocks. You can do one release operation to store a value, and have any number of acquire operations (zero or more) that load that value and observe its effects.

The acquire/release semantics on loads/stores are related to ordering of operations on either side of the load/store, to prevent re-ordering.

A non-relaxed atomic store (i.e. a release operation) to a variable A will synchronize with a later non-relaxed atomic load (i.e. an acquire operation) of the same variable A.

As the C++ standard says:

Informally, performing a release operation on A forces prior side effects on other memory locations to become visible to other threads that later perform a consume or an acquire operation on A.

So in the DCLP code you quoted, the m_instance.store(tmp, memory_order_release) is a store to m_instance and is a release operation. The m_instance.load(memory_order_acquire) is a load from m_instance and is an acquire operation. The memory model says that the store of a non-null pointer synchronizes with any load that sees a non-null pointer, which means that it is guaranteed that all the effects of new Singleton have completed before any thread can load a non-null value from tmp. This fixes the problems of pre-C++11 double-checked locking where the store to tmp could become visible to other threads before the object was fully constructed.

In other words, instead of looking at the instance I am using an atomic boolean to make sure the DCLP works, and whatever is inside the second tmp is surly to be syncronized and run once. Is it correct?

No, because you store false here:

        // store back the tmp atomically
        is_first.store(tmp, std::memory_order_release);

Which means on the next call to the function you create another Singleton and leak the first one. It should be:

        is_first.store(true, std::memory_order_release);

If you fix that, I think it's correct, but on typical implementations it uses more memory (sizeof(atomic<bool>)+sizeof(Singleton*) is probably more than sizeof(atomic<Singleton*>)), and by splitting the logic into two variables (a boolean and a pointer) you make it easier to get wrong, as you did. So there's no advantage to doing that way compared to the original, where the pointer itself also serves as the boolean because you look at the pointer directly, not at some boolean that might not have been set correctly.

Excellent answer! I just have a question, would replacing `Singleton* tmp = m_instance.load(std::memory_order_acquire);` with `Singleton* tmp = m_instance.load(std::memory_order_consume);` be a safe thing to do? And would it have any affect on the `memory_order_relaxed` inside? — Alejandro, Jun 02 '15 at 13:27
@Alejandro, in this case I believe it would be equivalent, and safe, but has no advantage (it's an open research topic how compilers should implement the dependency-ordering for `memory_order_consume` so it would probably even generate the same code using today's compilers). It would have no effect on the relaxed-load because the ordering of that is enforced by the mutex lock anyway. — Jonathan Wakely, Jun 03 '15 at 09:17

Is this implementation of Double checked lock pattern (DCLP) in C++11 is correct?

1 Answers1