Lock-free Reference Counting

Question

I'm working on a system that requires extensive C API interop. Part of the interop requires initialization and shutdown of the system in question before and after any operations. Failure to do either will result in instability in the system. I've accomplished this by simply implementing reference counting in a core disposable environment class like this:

public FooEnvironment()
{
  lock(EnvironmentLock)
  {
    if(_initCount == 0)
    {
      Init();  // global startup
    }
    _initCount++;
  }
}

private void Dispose(bool disposing)
{
  if(_disposed)
    return;

  if(disposing)
  {
    lock(EnvironmentLock)
    {
      _initCount--;
      if(_initCount == 0)
      {
        Term(); // global termination
      }
    }
  }
}

This works fine and accomplished the goal. However, since any interop operation must be nested in a FooEnvironment using block, we are locking all the time and profiling suggests that this locking accounts for close to 50% of the work done during run-time. It seems to me that this is a fundamental enough concept that something in .NET or the CLR must address it. Is there a better way to do reference counting?

Are `Interlocked.Increment` (and related) what you're looking for? — harold, Apr 09 '12 at 13:37
Do you mean that initialization and shutdown must be performed between *each* operation, or just before the first operation and after the last? — jalf, Apr 09 '12 at 13:42
Interlocked.Increment may work but I don't see how exactly. I need to also be able to test the value and act on it if it's zero. Can you suggest a way to do that? Thanks! — Jeff, Apr 09 '12 at 13:49
This is 4.0 and init and term only has to happen before the first and last operation (not between each) — Jeff, Apr 09 '12 at 13:50
The members of interlocked aren't especially fast either, by the way. — harold, Apr 09 '12 at 16:03

score 9 · Accepted Answer · edited Dec 15 '20 at 05:24

9

This is a trickier task than you might expect at first blush. I don't believe that Interlocked.Increment will be sufficient to your task. Rather, I expect you to need to perform some wizardry with CAS (Compare-And-Swap).

Note also that it's very easy to get this mostly-right, but mostly-right is still completely wrong when your program crashes with heisenbugs.

I strongly suggest some genuine research before going down this path. A couple good jumping off points pop to the top if you do a search for "Lock free reference counting." This Dr. Dobbs article is useful, and this SO Question might be relevant.

Above all, remember that lock free programming is hard. If this is not your specialty, consider stepping back and adjusting your expectations around the granularity of your reference counts. It may be much, much less expensive to rethink your fundamental refcount policy than to create a reliable lock-free mechanism if you're not an expert. Especially when you don't yet know that a lock-free technique will actually be any faster.

edited Dec 15 '20 at 05:24

Jason C

38,729
14
126
182

answered Apr 09 '12 at 15:16

Greg D

43,259
14
84
117

1

+1 the links are great and the exceptional comment "Note also that it's very easy to get this mostly-right, but mostly-right is still completely wrong when your program crashes with heisenbugs." It's never good when your customer calls saying "It seems like sometimes when I start the program it crashes within a few seconds and then other times it runs for days with no problems" and while debugging you find a misplaced lock statement :) – pstrjds Apr 09 '12 at 15:56
Maybe my question should have been more general as I would also appreciate any suggestions for lighter-weight locking schemes. The basic problem is the current approach is very expensive. I hope it wasn't implied that I was looking for mostly-right answers although I appreciate the posts that attempted to help. I had previously read that Dr. Dobbs article and while I admit it's somewhat over my head, my impression was that the solution presented relied on specific processor instructions and this is C#. – Jeff Apr 09 '12 at 16:00
@Jeff have you tried replacing your lock with the `SpinLock`, that is a cheaper locking mechanism if most locking is quick. – pstrjds Apr 09 '12 at 16:06
@Jeff: All lock-free programming uses processor-specific instructions. :) .Net gives them a nice name (CompareExchange instead of the x86 cmpxchg), but anything lock-free is necessarily _very_ close to the metal. If that article was over your head, I'd strongly discourage attempting to create a lock-free refcount mechanism. Try to figure out if there's a way you can lock and refcount less often than you're doing. – Greg D Apr 09 '12 at 16:20
@Jeff: Is the lock hot (ie, under a lot of contention)? – Greg D Apr 09 '12 at 16:21
@Greg D - Yes, the lock is under a lot of contention. If I'm understanding you correctly, your solution is to do nothing? And yes, I could write interop to these instructions but my hubris does know some bounds. Since this is only about optimization and works as is, I'm not going to do a lot of fancy (and potentially dangerous) stuff to fix it. I didn't think it necessary to point out that I'm a conscientious programmer but now I'm starting to wonder. – Jeff Apr 09 '12 at 16:38
Reworking/designing your lock/refcount strategy is a lot more work than doing nothing. :) – Greg D Apr 09 '12 at 16:54
Again, I'm not sure what a good redesign would be. I don't think there is a lot of applicable information outside of what I've stated here. If you have any suggestions I'd love to hear them. Thanks. – Jeff Apr 09 '12 at 17:37
I can't answer that for you without a _lot_ of contextual information. All I can tell you is contend less often. Are you locking on a full object graph, for example, when locking on the root would suffice? Is it possible to simply mandate that clients properly handle initialization/uninitialization for performance reasons? Or is it practical to let perf-sensitive clients opt-out of the automatic environment management? This isn't an exclusive set of Q's with which I could provide an answer, these are the kinds of things (amongst _many_ others) that _you_ must consider for _your_ solution. – Greg D Apr 09 '12 at 18:17

score 1 · Answer 2 · edited May 23 '17 at 12:29

1

As harold's comment notes the answer is Interlocked:

public FooEnvironment() {
  if (Interlocked.Increment(ref _initCount) == 1) {
    Init();  // global startup
  }
}

private void Dispose(bool disposing) {
  if(_disposed)
    return;

  if (disposing) {
    if (0 == Interlocked.Decrement(ref _initCount)) {
      Term(); // global termination
    }
  }
}

Both Increment and Decrement return the new count (just for this kind of usage), hence different checks.

But note: this will not work if anything else needs concurrency protection. Interlocked operations are themselves safe, but nothing else is (including different threads relative ordering of Interlocked calls). In the above code Init() can still be running after another thread has completed the constructor.

edited May 23 '17 at 12:29

Community

1
1

answered Apr 09 '12 at 13:55

Richard

106,783
21
203
265

1

"In the above code Init() can still be running after another thread has completed the constructor." << And that's why lock-free code should always be left to the experts. :) – Greg D Apr 09 '12 at 14:06
Actually, I'm ok locking inside Init and Term. This solves the larger problem of having to lock every time. Thanks! – Jeff Apr 09 '12 at 14:15
2

@Jeff: That doesn't solve the issue. Suppose you lock inside init. The second thread is never going to see that lock because it never runs init. It will still enter the component before initialization has completed. – Greg D Apr 09 '12 at 14:17
I see...good point. So this doesn't completely solve the problem. – Jeff Apr 09 '12 at 14:22

score 0 · Answer 3 · answered Apr 09 '12 at 13:49

0

Probably use a general static variable in a class. Static is only one thing and is not specific to any object.

answered Apr 09 '12 at 13:49

iefpw

6,816
15
55
79

EnvironmentLock and _initCount are both static – Jeff Apr 09 '12 at 13:56

pstrjds · Answer 4 · 2012-04-09T14:42:12.000

0

I believe this will give you a safe way using Interlocked.Increment/Decrement.

Note: This is oversimplified, the code below can lead to deadlock if Init() throws an exception. There is also a race condition in the Dispose when the count goes to zero, the init is reset and the constructor is called again. I don't know your program flow, so you may be better off using a cheaper lock like a SpinLock as opposed to the InterlockedIncrement if you have potential of initing again after several dispose calls.

static ManualResetEvent _inited = new ManualResetEvent(false);
public FooEnvironment()
{
    if(Interlocked.Increment(ref _initCount) == 1)
    {
        Init();  // global startup
        _inited.Set();
    }

    _inited.WaitOne();
}

private void Dispose(bool disposing)
{
    if(_disposed)
        return;

    if(disposing)
    {
        if(Interlocked.Decrement(ref _initCount) == 0)
        {
            _inited.Reset();
            Term(); // global termination
        }
    }
}

Edit:
In thinking about this further, you may want to consider some application redesign and instead of this class to manage Init and Term, just have a single call to Init at application startup and a call to Term when the app comes down, then you remove the need for locking altogether, and if the lock is showing up as 50% of your execution time, then it seems like you are always going to want to call Init, so just call it and away you go.

edited Apr 09 '12 at 14:42

answered Apr 09 '12 at 13:55

pstrjds

16,840
6
52
61

Note that Init() is **not** guaranteed to complete before thread 2 comes in, inspects the ref count, and starts running (on a not-yet-initialized environment). – Greg D Apr 09 '12 at 14:08
@Greg D - Good point. I knew it felt like I was missing something. Proof once again that multi-threading is hard. – pstrjds Apr 09 '12 at 14:11
@Greg D - So can I take it you're not going to fall into the trap of trying to come up with a solution? :) – Jeff Apr 09 '12 at 14:27
@Greg D - Well, I did suggest a still broken alternative, although depending on the program flow, my alternative may be safe (with the addition of some exception handling), but from the looks of it, the best bet is a cheaper lock. – pstrjds Apr 09 '12 at 14:30
@Jeff: I'm not smart enough to solve this one. Any lock free programming I do is "For Entertainment Purposes Only". :) My suggested solution is to rethink the granularity around your refcount strategy. – Greg D Apr 09 '12 at 15:18

Maciej · Answer 5 · 2012-04-10T15:00:45.203

You can make it nearly lock-free by using the following code. It would definitely lower contention and if this is your main problem it would be the solution you need.

Also I would suggest to call Dispose from destructor/finalizer (just in case). I have changed your Dispose method - unmanaged resources should be freed regardless of disposing argument. Check this for details on how to properly dispose an object.

Hope this helps you.

public class FooEnvironment
{
    private static int _initCount;
    private static bool _initialized;
    private static object _environmentLock = new object();

    private bool _disposed;

    public FooEnvironment()
    {
        Interlocked.Increment(ref _initCount);

        if (_initCount > 0 && !_initialized)
        {
            lock (_environmentLock)
            {
                if (_initCount > 0 && !_initialized)
                {
                    Init(); // global startup
                    _initialized = true;
                }
            }
        }
    }

    private void Dispose(bool disposing)
    {
        if (_disposed)
            return;

        if (disposing)
        {
            // Dispose managed resources here
        }

        Interlocked.Decrement(ref _initCount);

        if (_initCount <= 0 && _initialized)
        {
            lock (_environmentLock)
            {
                if (_initCount <= 0 && _initialized)
                {
                    Term(); // global termination
                    _initialized = false;
                }
            }
        }

        _disposed = true;
    }

    ~FooEnvironment()
    {
        Dispose(false);
    }
}

Thanks, unfortunately I think there is a small race condition between checking that _initCount == 0 in the ctor of one thread and dispose in another thread (so that we believe the environment is running when in fact it is not). — Jeff, Apr 10 '12 at 13:40

score 0 · Answer 6 · answered Apr 12 '12 at 01:05

Using Threading.Interlocked.Increment will be a little faster than acquiring a lock, doing an increment, and releasing the lock, but not enormously so. The expensive part of either operation on a multi-core system is enforcing the synchronization of memory caches between cores. The primary advantage of Interlocked.Increment is not speed, but rather the fact that it will complete in a bounded amount of time. By contrast, if one seeks to acquire a lock, perform an increment, and release the lock, even if the lock is used for no purpose other than guarding the counter, there is a risk that one might have to wait forever if some other thread acquires the lock and then gets waylaid.

You don't mention which version of .net you're using, but there are some Concurrent classes that might be of use. Depending upon your patterns of allocating and freeing things, a class that might seem a little tricky but could work well is the ConcurrentBag class. It's somewhat like a queue or stack, except that there's no guarantee that things will come out any particular order. Include in your resource wrapper a flag indicating whether it's still good, and include with the resource itself a reference to a wrapper. When an resource user is created, throw a wrapper object in the bag. When the resource user is no longer needed, set the "invalid" flag. The resource should remain alive as long as either there's at least one wrapper object in the bag whose "valid" flag is set, or the resource itself holds a reference to a valid wrapper. If when an item is deleted the resource doesn't seem to hold a valid wrapper, acquire a lock and, if the resource still doesn't hold a valid wrapper, pull wrappers out of the bag until a valid one is found, and then store that one with the resource (or, if none was found, destroy the resource). If when an item is deleted the resource holds a valid wrapper but the bag seems like it might hold an excessive number of invalid items, acquire the lock, copy the bag's contents to an array, and throw valid items back into the bag. Keep a count of how many items are thrown back, so one can judge when to do the next purge.

This approach may seem more complicated than using locks or Threading.Interlocked.Increment, and there are a lot of corner cases to worry about, but it may offer better performance because ConcurrentBag is designed to reduce resource contention. If processor 1 performs Interlocked.Increment on some location, and then processor 2 does so, processor 2 will have to instruct processor 1 to flush that location from its cache, wait until processor 1 has done so, inform all the other processors that it needs control of that location, load that location into its cache, and finally get around to incrementing it. After all that has happened, if processor 1 needs to increment the location again, the same general sequence of steps will be required. All of this is very slow. The ConcurrentBag class, by contrast, is designed so that multiple processors can add things to a list without cache collisions. Sometime between when things are added and when they're removed, they'll have to be copied to a coherent data structure, but such operations can be performed in batches in such a way as to yield good cache performance.

I haven't tried an approach like the above using ConcurrentBag, so I don't know what sort of performance it would actually yield, but depending upon the usage patterns it may be possible to give better performance than would be obtained via reference counting.

score 0 · Answer 7 · answered Apr 23 '12 at 17:08

Interlocked class approach work a little faster than the lock statment, but on a multi-core machine the speed advantage may not be very much, because Interlocked instructions must bypass the memory cache layers.

How important is it to call the Term() function when the code is not in use and/or when the program exits?

Frequently, you can just put the call to Init() once in a static constructor for the class that wraps the other APIs, and not really worry about calling Term(). E.g:

static FooEnvironment() { 
    Init();  // global startup 
}

The CLR will ensure that the static constructor will get called once, before any other member functions in the enclosing class.

It’s also possible to hook notification of some (but not all) application shutdown scenarios, making it possible to call Term() on clean shutdowns. See this article. http://www.codeproject.com/Articles/16164/Managed-Application-Shutdown

Lock-free Reference Counting

7 Answers7