Does a MemoryBarrier guarantee memory visibility for all memory?

Question

If I understand correctly, in C#, a lock block guarantees exclusive access to a set of instructions, but it also guarantees that any reads from memory reflect the latest version of that memory in any CPU cache. We think of lock blocks as protecting the variables read and modified within the block, which means:

Assuming you've properly implemented locking where necessary, those variables can only be read and written to by one thread at a time, and
Reads within the lock block see the latest versions of a variable and writes within the lock block become visible to all threads.

(Right?)

This second point is what interests me. Is there some magic by which only variables read and written in code protected by the lock block are guaranteed fresh, or do the memory barriers employed in the implementation of lock guarantee that all memory is now equally fresh for all threads? Pardon my mental fuzziness here about how caches work, but I've read that caches hold several multi-byte "lines" of data. I think what I'm asking is, does a memory barrier force synchronization of all "dirty" cache lines or just some, and if just some, what determines which lines get synchronized?

Eric Lippert · Answer 1 · 2016-09-20T20:05:41.693

If I understand correctly, in C#, a lock block guarantees exclusive access to a set of instructions...

Right. The specification guarantees that.

but it also guarantees that any reads from memory reflect the latest version of that memory in any CPU cache.

The C# specification says nothing whatsoever about "CPU cache". You've left the realm of what is guaranteed by the specification, and entered the realm of implementation details. There is no requirement that an implementation of C# execute on a CPU that has any particular cache architecture.

Is there some magic by which only variables read and written in code protected by the lock block are guaranteed fresh, or do the memory barriers employed in the implementation of lock guarantee that all memory is now equally fresh for all threads?

Rather than try to parse your either-or question, let's say what is actually guaranteed by the language. A special effect is:

Any write to a variable, volatile or not
Any read of a volatile field
Any throw

The order of special effects is preserved at certain special points:

Reads and writes of volatile fields
locks
thread creation and termination

The runtime is required to ensure that special effects are ordered consistently with special points. So, if there is a read of a volatile field before a lock, and a write after, then the read can't be moved after the write.

So, how does the runtime achieve this? Beats the heck out of me. But the runtime is certainly not required to "guarantee that all memory is fresh for all threads". The runtime is required to ensure that certain reads, writes and throws happen in chronological order with respect to special points, and that's all.

The runtime is in particular not required that all threads observe the same order.

Finally, I always end these sorts of discussions by pointing you here:

http://blog.coverity.com/2014/03/26/reordering-optimizations/

After reading that, you should have an appreciation for the sorts of horrid things that can happen even on x86 when you act casual about eliding locks.

In fact, the C# spec says far too little about the memory model, and the CLI spec leaves it open for truly crazy situations where even local variables can't be trusted to keep their values between reads. I'm hoping we can improve things a lot, but don't look for a reasonable memory model any time in the *very* near future... — Jon Skeet, Sep 20 '16 at 19:57
Like I said in a comment on Jon's answer, I don't completely follow this yet, but I can accept that it's correct and chew on it. Thanks for the response! — adv12, Sep 20 '16 at 20:36
This should help. A lock acquires two half-fences (1 at the start, 1 at the end). To keep it simple, let's just say it's a full fence. Here's what happens at the CPU level: "This serializing operation guarantees that every load and store instruction that precedes in program order the MFENCE instruction is globally visible before any load or store instruction that follows the MFENCE instruction is globally visible." http://x86.renejeschke.de/html/file_module_x86_id_170.html — IamIC, Feb 27 '17 at 09:22

score 6 · Accepted Answer · answered Sep 20 '16 at 19:56

6

Reads within the lock block see the latest versions of a variable and writes within the lock block are visible to all threads.

No, that's definitely a harmful oversimplification.

When you enter the lock statement, there a memory fence which sort of means that you'll always read "fresh" data. When you exit the lock state, there's a memory fence which sort of means that all the data you've written is guaranteed to be written to main memory and available to other threads.

The important point is that if multiple threads only ever read/write memory when they "own" a particular lock, then by definition one of them will have exited the lock before the next one enters it... so all those reads and writes will be simple and correct.

If you have code which reads and writes a variable without taking a lock, then there's no guarantee that it will "see" data written by well-behaved code (i.e. code using the lock), or that well-behaved threads will "see" the data written by that bad code.

For example:

private readonly object padlock = new object();
private int x;

public void A()
{
    lock (padlock)
    {
        // Will see changes made in A and B; may not see changes made in C
        x++;
    }
}

public void B()
{
    lock (padlock)
    {
        // Will see changes made in A and B; may not see changes made in C
        x--;
    }
}

public void C()
{
    // Might not see changes made in A, B, or C. Changes made here
    // might not be visible in other threads calling A, B or C.
    x = x + 10;
}

Now it's more subtle than that, but that's why using a common lock to protect a set of variables works.

answered Sep 20 '16 at 19:56

Jon Skeet

1,421,763
867
9,128
9,194

1

So the memory fence at the end of the lock makes the writes visible to main memory, but another thread needs a memory fence to guarantee that it reads an un-cached value? (Sorry if I'm butchering what you're saying. Still fuzzy.) – adv12 Sep 20 '16 at 20:11
2

@adv12: rather than reasoning about what is actually happening at the CPU level, think about what is guaranteed. The read of x in C is not a "special event" and therefore it can be re-ordered arbitrarily with respect to any of the other reads and writes. (Well, except for the write in C! Obviously there is a data-dependency there.) – Eric Lippert Sep 20 '16 at 20:13
@EricLippert, I'm still trying to understand what you've written about special effects. Working at it, but so far this apparently-oversimplified model of CPU caches I was presented years ago still makes more sense to me than more abstract definitions. – adv12 Sep 20 '16 at 20:15
1

@adv12: The business about reordering reads (and writes) is key, and also where it ends up being very confusing. Basically, unless anything guarantees otherwise, assume that any read you perform could be actually have been performed much earlier in the code. I'm not an expert on this - very few people are - I just try to stick to what's guaranteed to be safe. – Jon Skeet Sep 20 '16 at 20:17
@adv12: I agree that the cache model makes more sense. The problem is that the cache model is probably *stronger* than the *guaranteed* model, and therefore you might be assuming that something is impossible when it is in fact only impossible on x86, but thoroughly possible on weak-model processors. A good attitude to have with this stuff is that anything not forbidden is not only possible, but over the long run, likely. – Eric Lippert Sep 20 '16 at 20:21
1

@adv12: And also, I'm with Jon. I think it is safe to say that the two of us understand C# pretty well, but my knowledge of what optimizations are permitted by what memory models is not very good. I avoid this stuff in real code like the plague; don't write multithreaded code if you can avoid it, and if you can't avoid it, take the lock. – Eric Lippert Sep 20 '16 at 20:22
1

@EricLippert: Although if you take that to the extreme, you can't guarantee very much at all: `void Foo(string x) { if (x == null) { throw new NullArgumentException(); } int y = x.Length; }` *could* still throw due to JIT inlining, eliding the parameter and introducing an extra read of a field. That's the sort of thing I want to fix with a working group :) – Jon Skeet Sep 20 '16 at 20:23
@JonSkeet, Slightly off-topic, but I think this helps me understand why it's safe to pass an argument to, and return data from, a `BackgroundWorker` so long as I don't modify the argument from the GUI thread while the `BackgroundWorker` is running. It's because I'm essentially doing the same thing as a `lock` block: guaranteeing the `BackgroundWorker` thread exclusive access to the argument for the duration of its run, and the locks (or fences?) employed internally by `BackgroundWorker` ensure that the background thread sees a "fresh" argument and the GUI thread sees a "fresh" result. Yes? – adv12 Sep 22 '16 at 15:51
@adv12: Sort of. In my experience, very few things like BackgroundWorker - and async/await - are actually explicit about the memory fences involved, but I can only assume they're okay :) – Jon Skeet Sep 22 '16 at 15:52

Does a MemoryBarrier guarantee memory visibility for all memory?

2 Answers2

Linked