What are the pros and cons of using closures instead of locks for shared state?

Question

I'm trying to assess what is the fastest solution for sharing state across a single-writer, single-reader scenario, where the reader just consumes the latest value of a state variable assigned by the writer. The shared state can be of any managed type (i.e. reference or value types).

Ideally the synchronization solution would work as fast as the naive non-synchronized solution as possible, since the method will be used for both single-threaded and multi-threaded scenarios potentially thousands of times.

The order of read/writes shouldn't matter, as long as reader receives the latest value within some time frame (i.e. reader will only ever read, never modify, so update timing doesn't matter as long as it doesn't receive a future value before an older value...)

Naive solution, no locking:

var memory = default(int);
var reader = Task.Run(() =>
{
    while (true)
    {
        func(memory);
    }
});

var writer = Task.Run(() =>
{
    while (true)
    {
        memory = DateTime.Now.Ticks;
    }
});

What are really the issues with the naive solution? I've come up with these so far:

No guarantee reader sees the latest value (no memory barrier/volatile)
Values consumed by reader may be invalid if the type of the shared variable is not a primitive type or reference type (e.g. composite value types).

The straightforward solution would be locking:

var gate = new object();
var memory = default(int);
var reader = Task.Run(() =>
{
    while (true)
    {
        int local;
        lock(gate) { local = memory; }
        func(local);
    }
});

var writer = Task.Run(() =>
{
    while (true)
    {
        lock(gate)
        {
            memory = DateTime.Now.Ticks;
        }
    }
});

This of course works, but incurs the price of locking (~50ns) in the single-threaded case, and of course the price of context-switching/contention in the multi-threaded case.

This is totally negligible for most scenarios, but it is significant in my case since the method will be used across the board on potentially thousands of loops which need to be as timely as possible running tens of thousands of times per second.

The last solution I thought about is using immutable state closures to read the shared state:

Func<int> memory = () => default(int);
var reader = Task.Run(() =>
{
    while (true)
    {
        func(memory());
    }
});

var writer = Task.Run(() =>
{
    while (true)
    {
        var state = DateTime.Now.Ticks;
        memory = () => state;
    }
});

Now what would be the potential issues with this? My own performance benchmarks report ~10ns for this solution vs locking in the single-threaded case. It seems like a nice gain, but some considerations include:

Still no memory barrier/volatile so reader is not guaranteed to see the latest closure (how common is this actually? would be good to know...)
The atomicity problem is solved: since closure is a reference type, reading/writing has guaranteed atomicity according to the standard
Boxing costs: basically using a closure implies allocating memory on the heap somehow, which is happening on every iteration. Unclear what the costs of this really are but it seems to be faster than locking...

Is there anything else I'm missing? Do you usually consider this use for closures instead of locks in your programs? Would also be great to know other possible fast solutions for single-reader/single-writer shared state.

Only the second example is correct code. So I'd say go with that. — Peter Duniho, Mar 21 '15 at 10:16
"on potentially thousands of loops" -- do you mean these "thousand of loops" are running concurrently? If so, you should be focusing on fixing that first, before worrying about synchronization techniques. Context switches are expensive, so if you can set your program up so that the CPU cores are kept busy by a _few_ threads (i.e few enough to avoid context switching) which effectively accomplish their own context switching between tasks without actually interrupting the CPU pipeline, that would be much better. — Peter Duniho, Mar 21 '15 at 10:33
Sorry for being vague, the code is actually framework code, meaning it will be expected to run for a large number of possible scenarios, including many loops running in parallel at kHz rates. The method will be a generic function composition tool that is expected to run fast in these situations with or without concurrency (basically for rapid-prototyping). Totally agree with you on avoiding context switches, that's why I wanted to see how fast I could have this work for even single-threaded scenarios, so users would not have to worry so much about picking one or the other. — glopes, Mar 21 '15 at 10:45

score 3 · Accepted Answer · edited May 23 '17 at 12:31

Both your first and third examples, as you have already pointed out, fail to ensure the reader task sees the most recent value assigned by the writer task. A mitigating factor is that on x86 hardware, all memory access is essentially volatile, but there's nothing about your question that restricts the context to x86 hardware, and in any case that assumes the write or read wasn't optimized out by the JIT compiler.

Marc Gravell has an excellent demonstration of the hazards of unprotected write/read, in which the written value is simply never observed by the reading thread. The bottom line is that if you don't explicitly synchronize access, your code is broken.

So, go with the second example, which is the only one that's actually correct.

By the way, as far as using closures to wrap a value goes, I would say there's no point in that. You can just as effectively wrap some collection of values in an object directly instead of having the compiler generate the class for you, and use that object's reference as the shared value for the reader and writer. Using Thread.VolatileWrite() and Thread.VolatileRead() on the object reference addresses the cross-thread visibility issue (I'm assuming you're using a captured local here…of course if the shared variable is a field, you can just mark it volatile).

The values can be in a Tuple, or you can write your own custom class (which you would want to make immutable, like Tuple, to ensure against accidental bugs).

And of course, in your first example, if you did use volatile semantics, that addresses the visibility issue for a type like int, where the write and read can be done atomically.

Thanks for the answer! I think I found the Marc Gravell reference you were mentioning in this question: http://stackoverflow.com/questions/1222184/do-i-need-to-lock-or-mark-as-volatile-when-accessing-a-simple-boolean-flag-in-c — glopes, Mar 21 '15 at 10:31
Almost...his answer there links to the actual code example he wrote up, which is at _this_ answer here: http://stackoverflow.com/a/458193/3538012 But yes, that's the one. I'll edit the post so the link's there too. — Peter Duniho, Mar 21 '15 at 10:35
Yeah, found it now. It's crazy stuff! I recommend everyone that somehow randomly hits on this question to actually check it out. — glopes, Mar 21 '15 at 10:46

What are the pros and cons of using closures instead of locks for shared state?

1 Answers1

Linked