48

I have seen people/articles/SO posts who say they have designed their own "lock-free" container for multithreaded usage. Assuming they haven't used a performance-hitting modulus trick (i.e. each thread can only insert based upon some modulo) how can data structures be multi-threaded but also lock-free???

This question is intended towards C and C++.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
user997112
  • 29,025
  • 43
  • 182
  • 361
  • 5
    lock-free usually doesn't mean "any lock", it means something like [transactional memory](http://en.wikipedia.org/wiki/Transactional_memory), or [optimistic design](http://en.wikipedia.org/wiki/Optimistic_concurrency_control), where you don't use lock for every operation, but once in a while (for rolling back or doing transaction) - some kind of lock will be needed. – amit Dec 23 '12 at 14:46
  • 7
    Lock-free usually means they use compare-and-swap hardware operations. – brian beuning Dec 23 '12 at 15:59
  • In a somewhat broader definition, lock does not refer to mutex but to the potential of "locking up" other threads. The most obvious example is not releasing a mutex, however this is an example of not lock free code without using a mutex: while (X == 0) { X = 1 - X; } Given the right scheduling two threads can stay in this loop forever. http://preshing.com/20120612/an-introduction-to-lock-free-programming/ – Bas Smit Apr 03 '17 at 16:52

4 Answers4

61

The key in lock-free programming is to use hardware-intrinsic atomic operations.

As a matter of fact, even locks themselves must use those atomic operations!

But the difference between locked and lock-free programming is that a lock-free program can never be stalled entirely by any single thread. By contrast, if in a locking program one thread acquires a lock and then gets suspended indefinitely, the entire program is blocked and cannot make progress. By contrast, a lock-free program can make progress even if individual threads are suspended indefinitely.

Here's a simple example: A concurrent counter increment. We present two versions which are both "thread-safe", i.e. which can be called multiple times concurrently. First the locked version:

int counter = 0;
std::mutex counter_mutex;

void increment_with_lock()
{
    std::lock_guard<std::mutex> _(counter_mutex);
    ++counter;
}

Now the lock-free version:

std::atomic<int> counter(0);

void increment_lockfree()
{
    ++counter;
}

Now imagine hundreds of thread all call the increment_* function concurrently. In the locked version, no thread can make progress until the lock-holding thread unlocks the mutex. By contrast, in the lock-free version, all threads can make progress. If a thread is held up, it just won't do its share of the work, but everyone else gets to get on with their work.

It is worth noting that in general lock-free programming trades throughput and mean latency throughput for predictable latency. That is, a lock-free program will usually get less done than a corresponding locking program if there is not too much contention (since atomic operations are slow and affect a lot of the rest of the system), but it guarantees to never produce unpredictably large latencies.

Ajay
  • 18,086
  • 12
  • 59
  • 105
Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 4
    And this is about maximum that can be achieved without locks for the general case. Any more complex operation won't be supported at HW level intrinsics. Worth noting though that in some specialized environments locks can be avoided due to prior knowledge of external components (either hw or sw) - e.g., a HW may ensure no additional data will be sent until a previous package has been acknowledged by the receiving SW – SomeWittyUsername Dec 23 '12 at 15:06
  • 2
    @icepack: True, there's a limit to how much you can achieve without locks. I think the best example I'm comfortable with is a single-consumer, multiple-producer queue, which can be done completely correctly with atomics (producers use CAS to enqueue, consumer uses Exchange to dequeue the entire queue). A multiple/multiple queue is also possible atomically, but you lose the ability to manage dynamic allocations deterministically. – Kerrek SB Dec 23 '12 at 15:14
  • 2
    There is a lock-free memory allocator heap implementation. – brian beuning Dec 23 '12 at 16:04
  • @brianbeuning: The problem is the *de*-allocation, though... In a multi-consumer queue it's hard to know when it's safe to delete a queue node. – Kerrek SB Dec 23 '12 at 16:21
  • 2
    `std::atomic` must be implemented with a lock, if the hardware doesn't provide atomic updating of an `int`. Is it still considered "lock-free" even though it involves a lock in this case? – M.M Feb 07 '16 at 02:52
  • 3
    @M.M: No. That's why you have the `is_lock_free` member functions and macros to test this. The only operations that are required to be lock-free are those on `std::atomic_flag`. – Kerrek SB Feb 07 '16 at 11:37
  • what if the above example `mutex` is itself implemented in atomics, is it still LOCK programming ?, I assume NO – zinking Feb 04 '23 at 05:08
29

For locks, the idea is that you acquire a lock and then do your work knowing that nobody else can interfere, then release the lock.

For "lock-free", the idea is that you do your work somewhere else and then attempt to atomically commit this work to "visible state", and retry if you fail.

The problems with "lock-free" are that:

  • it's hard to design a lock-free algorithm for something that isn't trivial. This is because there's only so many ways to do the "atomically commit" part (often relying on an atomic "compare and swap" that replaces a pointer with a different pointer).
  • if there's contention, it performs worse than locks because you're repeatedly doing work that gets discarded/retried
  • it's virtually impossible to design a lock-free algorithm that is both correct and "fair". This means that (under contention) some tasks can be lucky (and repeatedly commit their work and make progress) and some can be very unlucky (and repeatedly fail and retry).

The combination of these things mean that it's only good for relatively simple things under low contention.

Researchers have designed things like lock-free linked lists (and FIFO/FILO queues) and some lock-free trees. I don't think there's anything more complex than those. For how these things work, because it's hard it's complicated. The most sane approach would be to determine what type of data structure you're interested in, then search the web for relevant research into lock-free algorithms for that data structure.

Also note that there is something called "block free", which is like lock-free except that you know you can always commit the work and never need to retry. It's even harder to design a block-free algorithm, but contention doesn't matter so the other 2 problems with lock-free disappear. Note: the "concurrent counter" example in Kerrek SB's answer is not lock free at all, but is actually block free.

Brendan
  • 35,656
  • 2
  • 39
  • 66
  • @Nik-Lz: On 80x86 I'd expect that `++counter` gets compiled into a single `lock inc dword [counter]` instruction in assembly, where the CPU uses its cache coherency protocol to ensure it occurs atomically, and where nothing (not even the ancient "bus lock" behaviour from the original 8086) blocks other threads or CPUs at all. – Brendan Dec 11 '18 at 05:14
9

The idea of "lock free" is not really not having any lock, the idea is to minimize the number of locks and/or critical sections, by using some techniques that allow us not to use locks for most operations.

It can be achieved using optimistic design or transactional memory, where you do not lock the data for all operations, but only on some certain points (when doing the transaction in transactional memory, or when you need to roll-back in optimistic design).

Other alternatives are based on atomic implementations of some commands, such as CAS (Compare And Swap), that even allows us to solve the consensus problem given an implementation of it. By doing swap on references (and no thread is working on the common data), the CAS mechanism allows us to easily implement a lock-free optimistic design (swapping to the new data if and only if no one have changed it already, and this is done atomically).

However, to implement the underlying mechanism to one of these - some locking will most likely be used, but the amount of time the data will be locked is (supposed) to be kept to minimum, if these techniques are used correctly.

amit
  • 175,853
  • 27
  • 231
  • 333
  • 3
    That makes more sense, that there are some locks involved! – user997112 Dec 23 '12 at 14:54
  • @user997112: Note however, if your hardware allows an atomic CAS, you should be able to implement optimistic design without any locks. (the swap will be reference swap to the complete data, and no thread will be actually working on the common data) – amit Dec 23 '12 at 14:56
  • 7
    There are no locks involved with "lock free". – Brendan Dec 23 '12 at 15:20
  • Isn't there a long standing proof that any algorithm can be written completely without locks using CAS? – Tim Seguine Sep 27 '13 at 18:37
7

The new C and C++ standards (C11 and C++11) introduced threads, and thread shared atomic data types and operations. An atomic operation gives guarantees for operations that run into a race between two threads. Once a thread returns from such an operation, it can be sure that the operation has gone through in its entirety.

Typical processor support for such atomic operations exists on modern processors for compare and swap (CAS) or atomic increments.

Additionally to being atomic, data type can have the "lock-free" property. This should perhaps have been coined "stateless", since this property implies that an operation on such a type will never leave the object in an intermediate state, even when it is interrupted by an interrupt handler or a read of another thread falls in the middle of an update.

Several atomic types may (or may not) be lock-free, there are macros to test for that property. There is always one type that is guaranteed to be lock free, namely atomic_flag.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177