1

i have following code:

 while(lock)
      ;   
 lock = 1;
 // critical section
 lock = 0;

As reading or changing lock value is in itself a multi-instruction

read lock
change value
write it

If it happens like:

1) One thread reads the lock and stops there
2) Another thread reads it and sees it is free; lock it and do something untill half
3) First thread wakes up and goes into CS

SO how would locking would be implmented in system ? Placing variables over top of another variables is not right : it would be like Guarding the guard ?

Stopping other processors threads is also not right ?

Ashish Negi
  • 5,193
  • 8
  • 51
  • 95
  • for which algorithm ,you have this code??? – perilbrain Aug 14 '12 at 11:14
  • "*As reading or changing lock value is in itself a multi-instruction*" - Not always. Atomic operations are offered by several architectures. And lock implementations use these, depending on the hardware. – ArjunShankar Aug 14 '12 at 11:31
  • How is Atomicity of that operation (instruction) implemented ? – Ashish Negi Aug 14 '12 at 11:36
  • It's impl. by the hardware, if you really wish to understand more you can look up stuff like MESI protocol (although it doesn't impl. the atomic operation itself). @David answer is good enough to make you started, – bestsss Aug 14 '12 at 12:02

2 Answers2

2

It is 100% platform specific. Generally, the CPU provides some form of atomic operation such as exchange or compare and swap. A typical lock might work like this:

1) Create: Store 0 (unlocked) in the variable.

2) Lock: Atomically attempt to switch the value of the variable from 0 (unlocked) to 1 (locked). If we failed (because it wasn't unlocked to begin with), let the CPU rest a bit, and then retry. Use a memory barrier to ensure no future memory operations sneak behind this one.

3) Unlock: Use a memory barrier to ensure previous memory operations don't sneak past this one. Atomically write 0 (unlocked) to the variable.

Note that you really don't need to understand this unless you want to design your own synchronization primitives. And if you want to do that, you need to understand an awful lot more. It's certainly a good idea for every programmer to have a general idea of what he's making the hardware do. But this is an area filled with seriously heavy wizardry. There are so many, many ways this can go horribly wrong. So just use the locking primitives provided by the geniuses who made your platform, compiler, and threading library. Here be dragons.

For example, SMP Pentium Pro systems have an erratum that requires special handling in the unlock operation. A naive implementation of the lock algorithm will cause the branch prediction logic to expect the operation to keep spinning, incurring a massive performance penalty at the worst possible time -- when you first acquire the lock. A naive implementation of the lock algorithm may cause two cores each waiting for the same lock to saturate the bus, slowing the CPU that needs to get work done in order to release the lock to a crawl. These all require heavy wizardry and deep understanding of the hardware to deal with.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • Thanks for your response. But it is your last comments that really fired me up. Can you explain your "2)" more or give some good references of understanding how this functions.? – Ashish Negi Aug 14 '12 at 11:48
  • It's platform specific. For x86, see [cmpxchg](http://faydoc.tripod.com/cpu/cmpxchg.htm) and [pause a.k.a rep nop](http://stackoverflow.com/questions/7086220/what-does-rep-nop-mean-in-x86-assembly). On x86, memory barriers are not needed, but a compiler optimization barrier ([memory clobber](http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html#ss5.3)) usually is. Unfortunately, it's going to get very complicated very quickly. – David Schwartz Aug 14 '12 at 11:50
0

In a course I studied at Uni, a possible firmware solution for implementing locks was presented in the form of the "atomicity bit" associated to a memory operation initiated by a processor.

Basically, when locking, you'll notice that you have a sequence of operations that need to be executed atomically: test the value of the flag and, if not set, set it to locked, otherwise try again. This sequence can be made atomic by associating a bit with each memory request send by the CPU. The first N-1 operations will have the bit set, while the last one will have it unset, to mark the end of the atomic sequence.

When the memory module (there can be several modules) where the flag data is stored receives the request for the first operation in the sequence (whose bit is set), it will serve it and not take requests from any other CPU until the CPU that initiated the atomic sequence sends a request with an unset atomicity bit (since these transactions are usually short, a coarse-grain approach like this is acceptable). Note that this is usually made easier by the assembler providing specialized instructions of type "compare-and-set", that do exactly what I mentioned before.

Tudor
  • 61,523
  • 12
  • 102
  • 142
  • Note that this is not the way locking is implemented on any modern computer system that I know of. It's a model for understanding it, as modern systems act "as if" they did this. The major problem with this approach is that it would slow locking down to memory speed, and on modern computers, memory is way too slow to be used for this purpose. (That's why we have all those caches -- memory is slower than CPU.) – David Schwartz Aug 14 '12 at 21:39
  • @David Schwartz: Indeed, it does seem like the "academic" method of doing things. I guess it's only useful to show what kind of mechanisms are involved in the process. – Tudor Aug 15 '12 at 12:19
  • @David Please explain "Modern Systems act as if they did this" ? – Ashish Negi Aug 16 '12 at 04:05
  • While they don't actually do that, they make it seem as if they did. For example, rather than locking a memory bus, they lock a chunk of memory into a cache. It achieves the same affect with respect to atomicity of operations on that bit of memory, but doesn't interfere with accesses to other sections of memory. – David Schwartz Aug 16 '12 at 07:39