In most cases, for releasing the resource, just resetting the lock to zero (as you do) is almost OK (e.g. on an Intel Core processor) but you need also to make sure that the compiler will not exchange instructions (see below, see also g-v's post). If you want to be rigorous (and portable), there are two things that need to be considered :
What the compiler does: It may exchange instructions for optimizing the code, and thus introduce some subtle bugs if it is not "aware" of the multithreaded nature of the code. To avoid that, it is possible to insert a compiler barrier.
What the processor does: Some processors (like Intel Itanium, used in professional servers, or ARM processors used in smart phones) have a so-called "relaxed memory model". In practice, it means that the processor may decide to change the order of the operations. Again, this can be avoided by using special instructions (load barrier and store barrier). For instance, in an ARM processor, the instruction DMB ensures that all store operations are completed before the next instruction (and it needs to be inserted in the function that releases a lock)
Conclusion: It is very tricky to make the code correct, if you have some compiler / OS support for these functionalities (e.g., stdatomics.h
, or std::atomic
in C++0x), it is much better to rely on them than writing your own (but sometimes you have no choice). In the specific case of standard Intel Core processor, I think that what you do is correct, provided you insert a compiler-barrier in the release operation (see g-v's post).
On compile-time versus run-time memory ordering, see: https://en.wikipedia.org/wiki/Memory_ordering
My code for some atomic / spinlocks implemented on different architectures:
http://alice.loria.fr/software/geogram/doc/html/atomics_8h.html
(but I'm unsure it's 100 % correct)