2

Given the following simple lock-free linear allocation algorithm I made:

bool allocate(size_type size, size_type& offset)
{
    size_type pos;
    size_type new_pos;
    
    pos = m_pos.load(std::memory_order_acquire);

    do
    {
        new_pos = pos + size;

        if (new_pos > m_end)
            return false;
    }
    while (!m_pos.compare_exchange_weak(pos, new_pos, std::memory_order_acq_rel, std::memory_order_acquire));

    offset = pos;
    return true;
}

I would like to improve it further to be wait-free on the fast path, discarding the CAS-loop completely and preferably using fetch_add, however I'm not sure how to handle the case when the allocation fails because the the requested size is greater than the available size - when this is detected the fetch_add is already incremented the position which is now over the valid range which may be simultaneously loaded by other threads, which would increment it further and have the false illusion that there is no space for the allocation. Some kind of spin-wait loop is still required in this path, however I can't come up with a valid solution.

Any idea?

plasmacel
  • 8,183
  • 7
  • 53
  • 101
  • Hi, I don't know if it is feasible, but you could log the allocation requests per non conditional branches, launch a scheduling algorithm to perform the offsets in another thread, so that the next time you call the same instructions you can read the offsets once they are available from the thread. In the meanwhile, you could just return the currently implemented allocation. – Cevik Mar 05 '21 at 17:11
  • @Cevik That would add a complexity which would destroy all the performance advantages of such a simple linear allocator. – plasmacel Mar 05 '21 at 17:17
  • 1
    Is is that important to use up every single byte of remaining space? Anyway so assuming fetch_add results in m_pos > m_end, it is then safe to assume no other thread would be able to allocate anything smaller. So when that happens you can safely fetch_add it back and return false. Meanwhile all smaller allocations will indeed fail. Not sure if that's super important. To reduce the chance of it you can have two different paths depending on size: for large size can do the CAS way, for small size - the fetch_add way. – rustyx Mar 05 '21 at 17:52
  • @rustyx Let's say we have 256MB of space. The first thread tries to allocate 257MB of space, which obviously returns `false`. No space is used now, but simultaneous allocations from another threads are already loaded (and further incremented) the position which was 257MB so these allocations also return `false`. The allocator becomes full, while actually 0 bytes of space is allocated. This problem never arises with the CAS-loop, because when an allocation is performed while an another allocation is running, one of the allocation loops again, re-loads the position and re-tries the allocation. – plasmacel Mar 05 '21 at 17:56
  • @rustyx Other than that if you check the loop, when the position would overflow the valid range, the function simply returns `false`, so the CAS never stores the modified position. – plasmacel Mar 05 '21 at 18:02
  • 1
    Note that you do not need to reload `m_pos` in the current loop. The CAS already reload `m_pos` in `pos` when it fails. So just one load before the loop is enough. – Jérôme Richard Mar 05 '21 at 18:06
  • `while()` remids me spinlock. – Алексей Неудачин Mar 05 '21 at 20:02

1 Answers1

1

Let's say we have 256MB of space. The first thread tries to allocate 257MB of space, which obviously returns false.

To handle that, you can store two ints:

  • One atomic, with current_pos
  • One non atomic, with capacity

then:


// first attempt, atomic load current_pos
// to early exit when the allocation would certainly fail,
// which will reduce wasting memory

current := atomic_load(current_pos)
if ( current + required > capacity) { return error }

// second attempt, atomic fetch_and_add current_pos
// it can still fail, since other threads may advanced current_pos
// in the background since the previous load

current := fetch_and_add(current_pos, required)
if ( current + required > capacity) {return error}

This is wait free, and can handle OOM errors.

plasmacel
  • 8,183
  • 7
  • 53
  • 101
Mascarpone
  • 2,516
  • 4
  • 25
  • 46
  • "You can drop the first atomic_load if you are willing to accept some memory will sit unused for a while." - can you elaborate on this? I don't clearly understand that. – plasmacel Nov 13 '22 at 03:44
  • I also think that it should be `next := fetch_and_add(current_pos, required) + required` since `fetch_and_add` returns the previous value yet before the addition. – plasmacel Nov 13 '22 at 04:36
  • 1
    I liked your original idea of loading the current pos without advancing it, and early exit when the allocation would obviously fail. It reduces memory wasting, so I made an edit to bring that mechanism back. – plasmacel Nov 13 '22 at 11:55