Implementing a FIFO mutex in pthreads

Question

I'm trying to implement a binary tree supporting concurrent insertions (which could occur even between nodes), but without having to allocate a global lock or a separate mutex or mutexes for each node. Rather, the quantity of such locks allocated should be on the order of the quantity of threads using the tree.

Consequently, I end up with a type of lock convoy problem. Explained more simply, it's what potentially happens when two or more threads do the following:

1 for(;;) {
2   lock(mutex)
3   do_stuff
4   unlock(mutex)
5 }

That is, if Thread#1 executes instructions 4->5->1->2 all in one "cpu burst" then Thread#2 gets starved from execution.

On the other hand, if there was a FIFO-type locking option for mutexes in pthreads, then such a problem could be avoided. So, is there a way to implement FIFO-type mutex locking in pthreads? Can altering thread priorities accomplish this?

How about throwing in a `sched_yield()` after unlocking? At least then the one that most recently had the lock will release the CPU giving others a chance at acquiring the lock. If there aren't any, then the thread would just be scheduled to run again. — Jeff Mercado, Mar 22 '11 at 01:40

score 6 · Answer 1 · answered Mar 22 '11 at 03:03

6

You can implement a fair queuing system where each thread is added to a queue when it blocks, and the first thread on the queue always gets the resource when it becomes available. Such a "fair" ticket lock built on pthreads primitives might look like this:

#include <pthread.h>

typedef struct ticket_lock {
    pthread_cond_t cond;
    pthread_mutex_t mutex;
    unsigned long queue_head, queue_tail;
} ticket_lock_t;

#define TICKET_LOCK_INITIALIZER { PTHREAD_COND_INITIALIZER, PTHREAD_MUTEX_INITIALIZER }

void ticket_lock(ticket_lock_t *ticket)
{
    unsigned long queue_me;

    pthread_mutex_lock(&ticket->mutex);
    queue_me = ticket->queue_tail++;
    while (queue_me != ticket->queue_head)
    {
        pthread_cond_wait(&ticket->cond, &ticket->mutex);
    }
    pthread_mutex_unlock(&ticket->mutex);
}

void ticket_unlock(ticket_lock_t *ticket)
{
    pthread_mutex_lock(&ticket->mutex);
    ticket->queue_head++;
    pthread_cond_broadcast(&ticket->cond);
    pthread_mutex_unlock(&ticket->mutex);
}

answered Mar 22 '11 at 03:03

caf

233,326
40
323
462

1

Unfortunately the problem is in the _mutexes themselves_. As Logan Capaldo mentioned, building the more elaborate lock would only "just make the described problem less likely and not actually prevent it". Of course I can implement a condition variable or a ticket lock type "wait queue", but that would not prevent the problem. Your ticket lock is based on the same "pthread mutexes", the contention for which could lead to the thread starvation described in the question. – ManRow Mar 22 '11 at 03:14
1

@ManRow: Look again. Under this scheme, the same thread can only re-acquire the lock immediately if no other thread is contending for it. This ensures that starvation will not happen - because once a thread calls `ticket_lock()`, it is guaranteed to acquire the lock before the current holder reacquires it. – caf Mar 22 '11 at 03:18
@caf: Not at all. The current holder is perfectly capable of _releasing the lock_ and _reacquiring it_ in the same cpu burst (see the code in the original question!!). In order to call "ticket_lock", any thread _must first acquire the mutex_. Why can't the current holder just do that consecutively and keep the whole queue to himself? Where is the _guarantee_ that another thread will be able to even enter the lock at all? After all, unlocking a mutex only makes threads waiting on it "ready", not "running"! – ManRow Mar 22 '11 at 03:51
1

@ManRow: If you're concerned about the case (on a uniprocessor) where Thread A is suspended at the first `pthread_mutex_lock()` in `ticket_lock()`, while Thread B runs `ticket_lock(); do_work(); ticket_unlock();` many times until its timeslice runs out, then that's indistinguishable from Thread A being suspended at the point just *before* it called `ticket_lock()` while Thread B gets to use up its timeslice. In the latter situation, Thread B hasn't even registered its interest in the lock. – caf Mar 22 '11 at 03:56
@caf: But you cannot tell _where_ Thread B finishes its timeslice! If Thread B keeps finishing in your lock's critical section, it will always starve Thread A regardless. You would have to ensure that the _very mutex itself_ locks in FIFO order, which is what my question asks for. – ManRow Mar 22 '11 at 04:07
@ManRow: No - the underlying mutex is *not* held within critical region protected by the ticket lock. As long as Thread B doesn't finish its timeslice within the `tick_lock()` or `tick_unlock()` functions themselves, Thread A will be able to run. – caf Mar 22 '11 at 04:15
@caf: No - your underlying mutex _is_ held within the critical section _of the lock itself_. I.e., the critical section I was referring to, and you referred to by tick_lock() and tick_unlock(). What else did you think I was referring to? I wasn't referring to the section "protected by the lock", only that section "protected by the lock's mutex" your lock relies on. As you just mentioned, "As long as Thread B doesn't finish its timeslice within the tick_lock() or tick_unlock()". Unfortunately _there is no way to absolutely guarantee this_, hence Thread A could remain indefinitely starved. – ManRow Mar 22 '11 at 04:31
@ManRow: Those critical sections are very short though (and the `pthread_cond_broadcast()` can be moved outside of the critical section too, without sacrificing correctness) - if any nontrivial amount of work is done within the ticketlock-protected section at all, then the likelihood of ending the timeslice within the problematic section is minimal. – caf Mar 22 '11 at 07:37

score 3 · Answer 2 · answered Mar 22 '11 at 01:56

3

You could do something like this:

define a "queued lock" that consists of a free/busy flag plus a linked-list of pthread condition variables. access to the queued_lock is protected by a mutex
to lock the queued_lock:
- seize the mutex
- check the 'busy' flag
- if not busy; set busy = true; release mutex; done
- if busy; create a new condition @ end of queue & wait on it (releasing mutex)
to unlock:
- seize the mutex
- if no other thread is queued, busy = false; release mutex; done
- pthread_cond_signal the first waiting condition
- do not clear the 'busy' flag - ownership is passing to the other thread
- release mutex
when waiting thread unblocked by pthread_cond_signal:
- remove our condition var from head of queue
- release mutex

Note that the mutex is locked only while the state of the queued_lock is being altered, not for the whole duration that the queued_lock is held.

answered Mar 22 '11 at 01:56

David Gelhar

27,873
3
67
84

Doesn't this just make the described problem less likely and not actually prevent it? I think your approach is almost there, what if we were to add an additional condvar that we wait in `unlock` for the flag to become busy again, if the mutex was contended (the list of condvars was nonempty). This way we guarantee we've relinquished the lock to someone else before we're ever capable of trying to take it again. – Logan Capaldo Mar 22 '11 at 02:08
Nope. I have the same problem. You have to wait on that additional condvar even if the mutex _wasn't_ contended or else you can still end up with one thread zipping around too fast. And then you have the problem of, how do you ever stop the threads. I suppose you can have an unlock which doesn't wait and a fair_unlock and call unlock when you're going to break out of the loop. – Logan Capaldo Mar 22 '11 at 02:11
Actually unlock is probably the wrong place to deal with this. You might have interesting work to do after you unlock, there's no reason you should hold up there for someone to take the lock. The mutex should track the thread that last held the lock. If it was you, when locking, pretend someone else has the lock and add yourself to the waiters instead of taking it. – Logan Capaldo Mar 22 '11 at 02:20
Yes, I think the idea of tracking who last had the lock is heading in the right direction. I don't think you can add yourself to the waiters, though - wait if no one else ever comes along to wake you? But what if you did a `sched_yield` before allowing the same thread to re-lock the mutex? That should ensure that any other thread that's spinning trying to grab the mutex gets a chance. – David Gelhar Mar 22 '11 at 03:24

Flavio · Answer 3 · 2014-11-05T10:02:25.047

You can obtain a fair Mutex with the idea sketched by @caf, but using atomic operations to acquire the ticket before doing the actual lock.

#if defined(_MSC_VER)
typedef volatile LONG Sync32_t;
#define SyncFetchAndIncrement32(V) (InterlockedIncrement(V) - 1)
#elif (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) > 40100
typedef volatile uint32_t Sync32_t;
#define SyncFetchAndIncrement32(V) __sync_fetch_and_add(V, 1)
#else
#error No atomic operations
#endif

class FairMutex {
private:
    Sync32_t                _nextTicket;
    Sync32_t                _curTicket;
    pthread_mutex_t         _mutex;
    pthread_cond_t          _cond;

public:
    inline FairMutex() : _nextTicket(0), _curTicket(0), _mutex(PTHREAD_MUTEX_INITIALIZER), _cond(PTHREAD_COND_INITIALIZER)
    {
    }
    inline ~FairMutex()
    {
        pthread_cond_destroy(&_cond);
        pthread_mutex_destroy(&_mutex);
    }
    inline void lock()
    {
        unsigned long myTicket = SyncFetchAndIncrement32(&_nextTicket);
        pthread_mutex_lock(&_mutex);
        while (_curTicket != myTicket) {
            pthread_cond_wait(&_cond, &_mutex);
        }
    }
    inline void unlock()
    {
        _curTicket++;
        pthread_cond_broadcast(&_cond);
        pthread_mutex_unlock(&_mutex);
    }
};

More broadly, i would not call this a FIFO Mutex, because it gives the impression to maintain an order which is not there in the first place. If your threads are calling a lock() in parallel, they can not have an order before calling the lock, so it makes no sense to create a mutex preserving an order relationship which is not there.

The gcc case is wrong, you should add a `- 1` after `sync_add_and_fetch` just like you have it in the Windows call. — tohava, Aug 19 '14 at 09:29

score 2 · Answer 4 · answered Mar 22 '11 at 07:37

The example as you post it has no solution. Basically you only have one critical section and there is no place for parallelism.

That said, you see that it is important to reduce the period that your threads hold the mutex to a minimum, just a handful of instructions. This is difficult for insertion in a dynamic data structure such as a tree. The conceptually simplest solution is to have one read-write lock per tree node.

If you don't want to have individual locks per tree node you could have one lock structure per level of the tree. I'd experiment with read-write locks for that. You may use just read-locking of the level of the node in hand (plus the next level) when you traverse the tree. Then when you have found the right one to insert lock that level for writing.

score 1 · Answer 5 · edited May 23 '17 at 12:22

The solution could be to use atomic operations. No locking, no context switching, no sleeping, and much much faster than mutexes or condition variables. Atomic ops are not the end all solution to everything, but we have created a lot of thread safe versions of common data structures using just atomic ops. They are very fast.

Atomic ops are a series of simple operations like increment, or decrement or assignment that are guaranteed to execute atomically in a multi threaded environment. If two threads hit the op at the same time, the cpu makes sure one thread executes the op at a time. Atomic ops are hardware instructions, so they are fast. "Compare and swap" is very useful for thread safe data structures. in our testing atomic compare and swap is about as fast as 32 bit integer assignment. Maybe 2x as slow. When you consider how much cpu is consumed with mutexes, atomic ops are infinitely faster.

Its not trivial to do rotations to balance your tree with atomic operations, but not impossible. I ran into this requirement in the past and cheated by making a thread safe skiplist since a skiplist can be done real easy with atomic operations. Sorry I can't give you a copy of our code...my company would fire me, but its easy enough to do yourself.

How atomic ops work to make lock free data structures can be visualized by the simple threadsafe linked list example. To add an item to a global linked list (_pHead) without using locks. First save a copy of _pHead, pOld. I think of these copies as "the state of the world" when executing concurrent ops. Next create a new node, pNew, and set its pNext to the COPY. Then use atomic "compare and swap" to change _pHead to pNew ONLY IF pHead IS STILL pOld. The atomic op will succeed only if _pHead hasn't changed. If it fails, loop back to get a copy of the new _pHead and repeat.

If the op succeeds, the rest of the world will now see a new head. If a thread got the old head a nanosecond before, that thread wont see the new item, but the list will still be safe to iterate through. Since we preset the pNext to the old head BEFORE we added our new item to the list, if a thread sees the new head a nanosecond after we added it, the list is safe to traverse.

Global stuff:

typedef struct _TList {
  int data;
  struct _TList *pNext;
} TList;

TList *_pHead;

Add to list:

TList *pOld, *pNew;
...
// allocate/fill/whatever to make pNew
...
while (1) { // concurrency loop
  pOld = _pHead;  // copy the state of the world. We operate on the copy
  pNew->pNext = pOld; // chain the new node to the current head of recycled items
  if (CAS(&_pHead, pOld, pNew))  // switch head of recycled items to new node
    break; // success
}

CAS is shorthand for __sync_bool_compare_and_swap or the like. See how easy? No Mutexes...no locks! In the rare event that 2 threads hit that code at the same time, one simply loops a second time. We only see the second loop because the scheduler swaps a thread out while in the concurrency loop. so it is rare and inconsequential.

Things can be pulled off the head of a linked list in a similar way. You can atomically change more than one value if you use unions and you can use uup to 128 bit atomic ops. We have tested 128 bit on 32 bit redhat linux and they are ~same speed as the 32, 64 bit atomic ops.

You will have to figure out how to use this type of technique with your tree. A b tree node will have two ptrs to child nodes. You can CAS them to change them. The balancing problem is tough. I can see how you could analyze a tree branch before you add something and make a copy of the branch from a certain point. when you finish changing the branch, you CAS the new one in. This would be a problem for large branches. Maybe balancing can be done "later" when the threads are not fighting over the tree. Maybe you can make it so the tree is still searchable even though you haven't cascaded the rotation all the way...in other words if thread A added a node and is recursively rotating nodes, thread b can still read or add nodes. Just some ideas. In some cases, we make a structure that has version numbers or lock flags in the 32 bits after the 32 bits of pNext. We use 64 bit CAS then. Maybe you could make the tree safe to read at all times without locks, but you might have to use the versioning technique on a branch that is being modified.

Here are a bunch of posts I have made talking about the advantages of atomic ops:

Pthreads and mutexes; locking part of an array

Efficient and fast way for thread argument

Configuration auto reloading with pthreads

Advantages of using condition variables over mutex

single bit manipulation

Is memory allocation in linux non-blocking?

score -1 · Answer 6 · edited Jun 20 '20 at 09:12

You might take a look at the pthread_mutexattr_setprioceiling function.

int pthread_mutexattr_setprioceiling
(
    pthread_mutexatt_t * attr, 
    int prioceiling,
    int * oldceiling
);

From the documentation:

pthread_mutexattr_setprioceiling(3THR) sets the priority ceiling attribute of a mutex attribute object.

attr points to a mutex attribute object created by an earlier call to pthread_mutexattr_init().

prioceiling specifies the priority ceiling of initialized mutexes. The ceiling defines the minimum priority level at which the critical section guarded by the mutex is executed. prioceiling will be within the maximum range of priorities defined by SCHED_FIFO. To avoid priority inversion, prioceiling will be set to a priority higher than or equal to the highest priority of all the threads that might lock the particular mutex.

oldceiling contains the old priority ceiling value.

This doesn't address the question actually asked. Avoiding priority inversion is a different problem to avoiding starvation. — davmac, May 27 '16 at 18:22

Implementing a FIFO mutex in pthreads

6 Answers6

Linked