4

I only read a little bit about this topic, but it seems that the only benefit is to get around contention problems but it will not have any important effect on the deadlock problem as the code which is lock free is so small and fundamental (fifos, lifos, hash) that there was never a deadlock problem.

So it's all about performance - is this right?

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
Lothar
  • 12,537
  • 6
  • 72
  • 121

7 Answers7

9

Lock-free programming is (as far as I can see) always about performance, otherwise using a lock is in most cases much simpler, and therefore preferable.

Note however that with lock-free programming you can end up trading deadlock for live-lock, which is a lot harder to diagnose since no tools that I know of are designed to diagnose it (although I could be wrong there).

I'd say, only go down the path of lock-free if you have to; that is, you have a scenario where you have a heavily contended lock that is hurting your performance. (If it ain't broke, don't fix it).

jerryjvl
  • 19,723
  • 7
  • 40
  • 55
  • I don't see how a lock-free algorithm should result in a live-lock condition? A live-lock needs locks, at least. – sstn Mar 15 '13 at 08:19
  • 1
    I must admit I took liberties with terminology that I shouldn't have. I'd have been better off describing how you could try to write an algorithm without locking and instead make it so that two threads get stuck in a pattern of alternately trying to update the same memory with neither ever "winning". – jerryjvl Jun 30 '13 at 01:03
  • That's not possible. In a lock-free scenario, one thread will always "win". Always. That's a fundamental feature of a proper lock-free design. If you're updating an immutable object reference with a new reference, the exchange will either succeed or fail. It will fail for the losers, but in order for there to be losers, at least one had to win. There will always be at least one winner, so live-lock shouldn't be possible. It isn't possible in my designs anyway. – Triynko Apr 10 '15 at 05:11
  • "Herlihy & Shavit, authors of The Art of Multiprocessor Programming, tend to express such operations as class methods, and offer the following succinct definition of lock-free (see slide 150): “In an infinite execution, infinitely often some method call finishes.” In other words, as long as the program is able to keep calling those lock-free operations, the number of completed calls keeps increasing, no matter what. It is algorithmically impossible for the system to lock up during those operations." – Triynko Apr 10 '15 at 05:17
  • Again: "Perhaps the most often-discussed RMW operation is compare-and-swap (CAS). On Win32, CAS is provided via a family of intrinsics such as _InterlockedCompareExchange. Often, programmers perform compare-and-swap in a loop to repeatedly attempt a transaction. This pattern typically involves copying a shared variable to a local variable, performing some speculative work, and attempting to publish the changes using CAS. Such loops still qualify as lock-free, **because if the test fails for one thread, it means it must have succeeded for another.**" – Triynko Apr 10 '15 at 05:23
  • That is all correct as long as the person performing the implementation actually implements a correct lock-free implementation. If only that were the common scenario... I'd still argue that attempting a lock-free implementation yourself is best avoided unless unavoidable. And even then, try to rely on an existing implementation that is known to work. – jerryjvl May 16 '15 at 03:16
9

Couple of issues.

We will soon be facing desktop systems with 64, 128 and 256 cores. Parallism in this domain is unlike our current experience of 2, 4, 8 cores; the algorithms which run successfully on such small systems will run slower on highly parallel systems due to contention.

In this sense, lock-free is important since it is contributes strongly to solving scalability.

There are also some very specific areas where lock-free is extremely convenient, such as the Windows kernel, where there are modes of execution where sleeps of any kind (such as waits) are forbidden, which obviously is very limiting with regard to data structures, but where lock-free provides a good solution.

Also, lock-free data structures often do not have failure modes; they cannot actually fail, where lock-based data structures can of course fail to obtain their locks. Not having to worry about failures simplifies code.

I've written a library of lock free data structures which I'll be releasing soon. I think if a developer can get hold of a well-proven API, then he can just use it - doesn't matter if it's lock-free or not, he doesn't need to worry about the complexity in the underlying implementation - and that's the way to go.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • do you have a link to this library you mentioned? I think some people here would be interested to see it. – Evan Teran Nov 09 '09 at 18:51
  • 2
    http://www.liblfds.org Don't get your hopes up - there's only a few simple data structures. The ringbuffer is the most useful. The last two or three months have been spent getting the on-line infrastructure built and making the code accessable (building easily on popular platforms, etc). The library will I think really start to be useful once I've done a proper singly linked list and a hash based on that. Also, I wouldn't recommend using release 3 - too many issues (mainly build related, but also a major API change over from void pointers to structs is coming in release 4). –  Nov 10 '09 at 10:03
1

It's also about scalability. In order to get performance gains these days, you'll have to parallelise the problems you're working on so you can scale them across multiple cores - the more, the merrier.

The traditional way of doing this is by locking data structures that require parallel access but the more threads you can run truly parallel, the bigger an bottleneck this becomes.

So yes, it is about performance...

Timo Geusch
  • 24,095
  • 5
  • 52
  • 70
  • 4
    You are making an unwritten assumption that the parallel threads are all going to be accessing the locked structure simultaneously. This does however depend on the nature of the algorithm. I'd add the advice as with anything, to measure if there *is* a performance problem before throwing lock-free at it as a solution. – jerryjvl Sep 06 '09 at 11:50
1

For preemptive threading, threads suspended while holding a lock can block threads that would otherwise be making forward progress. Lock-free doesn't have that problem since by Herlihy's definition, some other thread can always make forward progress.

For non-preemptive threading, it doesn't matter that much since even spin lock based solutions are lock-free by Herlihy's definition.

1

This is about performances - but also about the ability to take multi-thread loads:

  • locks grant an exclusive access to a portion of code: while a thread has a lock, other threads are spinning (looping while trying to acquire the lock) or blocked, sleeping until the lock is released (which usually happens if spinning lasts too long);

  • atomic operations grant an exclusive access to a resource (usually a word-sized variable or a pointer) by using uninterruptible intrinsic CPU instructions.

As locks BLOCK other threads' execution, a program is slowed-down. As atomic operations execute serially (one after another), there is no blocking*.

(*) as long as the number of concurrent CPUs trying to access the same resource do not create a bottleneck - but we don't have enough CPU Cores yet to see this as a problem.

I have worked on the matter to write a wait-free (lock-free without wait states) Key-Value store for the server I am working on.

Libraries like Tokyo Cabinet (even TC-FIXED, a simple array) rely on locks to preserve the integrity of a database:

"while a writing thread is operating the database, other reading threads and writing threads are blocked" (Tokyo Cabinet documentation)

The results of a test without concurrency (a one-thread test):

SQLite   time: 56.4 ms (a B-tree)
TC       time: 10.7 ms (a hash table)
TC-FIXED time:  1.3 ms (an array)
G-WAN KV time:  0.4 ms (something new which works, but I am not sure a name is needed)

With concurrency (several threads writing and reading in the same DB), only the G-WAN KV survived the same test because (by contrast with the others) it never ever blocks.

So, yes, this KV store makes it easier for developpers to use it since they do not have to care about threading issues. Making it work this way was not trivial however.

pierre
  • 11
  • 1
0

I believe I saw an article that mathematically proved that any algorithm can be written in a wait free manner (which basically means that you can be assured of each thread always making progress towards its goal). This means that it can be applied to any large scale application (after all, a program is just an algorithm with many, many parameters) and because wait free ensures that neither dead/live-lock occurs within it (as long as it doesn't have bugs which preclude it from being truly wait free), it does simplify that side of the program. On the other hand, a mathematical proof is a far cry from actually implementing the code itself (AFAIK, there isn't even a fully lock-free linked list that can run on PCs, I've seen ones that cover most parts, but they usually either can't handle some common functions, or some functions require the structure to be locked).

On a side note, I've also found another proof that showed any lock-free algorithm can actually be considered wait-free due to the laws of probability and various other factors.

Grant Peters
  • 7,691
  • 3
  • 45
  • 57
0
  1. Scalability is a really important issue in efficient multi/manicore programming. The greatest limiting factor is actually the code section that should be executed in serial (see Amdahl's Law). However, contentions on locks are also very problematic.

  2. Lock-free algorithm addresses the scalability problem which legacy lock has. So, I could say lock-free is mostly for performance, not decreasing the possibility of deadlock.

  3. However, keep in mind, with current x86 architecture, writing general lock-free algorithm is impossible. This is because we can't atomically exchange arbitrary size of data in current x86 (and also true for other architectures except for Sun's ROCK). So, current lock-free data structures are quite limited and very specialized for specific uses.

  4. I think current lock-free data structures would not be used anymore in a decade. I strongly expect hardware-assisted general lock-free mechanism (yes, that is transactional memory, TM) will be implemented within a decade. If any kind of TM is implemented, though it can't perfectly solve the problems of locks, many problems (including priority inversion and deadlock) will be eliminated. However, implementing TM in hardware is still very challenging, and in x86, only a draft just has been proposed.

It's still too long: 2 sentences summary.

Lock-free data structure is not panacea for lock-based multithreading programming (even TM is not. If you seriously need scalability and have troubles on lock contention, then consider lock-free data structure.

minjang
  • 8,860
  • 9
  • 42
  • 61