C# Updateing object references and multithreading

Question

After reading so much about how to do it, I'm quite confused.

So here is what I want to do: I have a datastructure/object that holds all kinds of information. I tread the datastructure as if it were immutable. Whenever I need to update information, I make a DeepCopy and do the changes to it. Then I swap the old and the newly created object.

Now I don't know how to do everything right.

Let's look at it from the side of the reader/consumer threads.

MyObj temp = dataSource;
var a = temp.a;
... // many instructions
var b = temp.b;
....

As I understand reading references is atomic. So I don't need any volatile or locking to assign the current reference of dataSource to temp. But what about the Garbage Collection. As I understand the GC has some kind of reference counter to know when to free memory. So when another thread updates dataSource exactly at the moment when dataSource is assigned to temp, does the GC increase the counter on the right memory block? The other thing is compiler/CLR optimization. I assign dataSource to temp and use temp to access data members. What does the CLR do? Does it really make a copy of dataSource or does the optimizer just use dataSource to access .a and .b? Let's assume that between temp.a and temp.b are lot's of instructions so that the reference to temp/dataSource cannot be held in a CPU register. So is temp.b really temp.b or is it optimized to dataSource.b because the copy to temp can be optimized away. This is especially important if another thread updates dataSource to point to a new object.

Do I really need volatile, lock, ReadWriterLockSlim, Thread.MemoryBarrier or something else? The important thing to me is that I want to make sure that temp.a and temp.b access the old datastructure even when another thread updates dataSource to another newly created data structure. I never change data inside an existing structure. Updates are always done by creating a copy, updating data and then updating the reference to the new copy of the datastructre.

Maybe one more question. If I don't use volatile, how long does it take until all cores on all CPUs see the updated reference?

When it comes to volatile please have a look here: When should the volatile keyword be used in C#?

I have done a little test programm:

namespace test1 {
  public partial class Form1 : Form {
    public Form1() { InitializeComponent(); }

    Something sharedObj = new Something();

    private void button1_Click(object sender, EventArgs e) {
      Thread t = new Thread(Do);          // Kick off a new thread
      t.Start();                               // running WriteY()

      for (int i = 0; i < 1000; i++) {
        Something reference = sharedObj;

        int x = reference.x; // sharedObj.x;
        System.Threading.Thread.Sleep(1);
        int y = reference.y; // sharedObj.y;

        if (x != y) {
          button1.Text = x.ToString() + "/" + y.ToString();
          Update();
        }
      }
    }

    private void Do() {
      for (int i = 0; i < 1000000; i++) {
        Something someNewObject = sharedObj.Clone(); // clone from immutable
        someNewObject.Do();
        sharedObj = someNewObject; // atomic
      }
    }
  }

  class Something {
    public Something Clone() { return (Something)MemberwiseClone(); }
    public void Do() { x++; System.Threading.Thread.Sleep(0); y++; }
    public int x = 0;
    public int y = 0;
  }
}

In Button1_click there is a for-loop and inside the for-loop I access a datastructure/object once using the direct "shareObj" and once using a temporarily created "reference". Using the reference is enough to make sure that "var a" and "var b" are initialized with values from the same object.

The only thing I don't understand is, why is "Something reference = sharedObj;" not optimized away and "int x = reference.x;" not replaced by "int x = sharedObj.x;"?

How does the compiler, jitter know not to optimize this? Or are temporarily objects never optimized in C#?

But most important: Is my example running as intended because it is correct or is it running as intended only by accident?

You need some kind of synchronization for the shared variable that you mutate. Otherwise writes might not ever become visible. volatile would be appropriate. — usr, Jul 30 '15 at 09:35
With the newly provided information I don't see any problem. When you change an object reference and access it before and after that change/assignment, you will get two different results. Volatile/MemoryBarrier is a complete different thing. If you use the local variable temp or reference or whatever you're good. But I don't see any reason for a SO-post. — Patrik, Jul 31 '15 at 08:29
The reason is simple. I fear that my "copy" of the reference pointer might be optimized away by the compiler even if it is working now in Debug and Release mode, I don't know if it will work in the future, when the Jitter might get better and optimizes more aggressively. Is there any documentation about what the C# Compiler and the Jitter will optimize and what will never be touched? If I have b=a; c=b; what hinders the compiler from making that c=a and remove b completely? That's why I'm asking, I simply don't know how the optimizers works. — bebo, Jul 31 '15 at 15:21

score 2 · Answer 1 · answered Sep 16 '15 at 11:03

As I understand reading references is atomic.

Correct. This is a very limited property though. It means reading a reference will work; you'll never get the bits of half an old reference mixed with the bits of half a new reference resulting in a reference that doesn't work. If there's a concurrent change it promises nothing about whether you get the old or the new reference (what would such a promise even mean?)

So I don't need any volatile or locking to assign the current reference of dataSource to temp.

Maybe, though there are cases where this can have problems.

But what about the Garbage Collection. As I understand the GC has some kind of reference counter to know when to free memory.

Incorrect. There is no reference counting in .NET garbage collection.

If there is a static reference to an object, then it is not eligible for reclamation.

If there is an active local reference to an object, then it is not eligble for reclamation.

If there is a reference to an object in a field of an object that is not eligible for reclamation, then it too is not eligible for reclamation, recursively.

There's no counting here. Either there is an active strong reference prohibiting reclamation, or there isn't.

This has a great many very important implications. Of relevance here is that there can never be any incorrect reference counting, since there is no reference counting. Strong references will not disappear under your feet.

The other thing is compiler/CLR optimization. I assign dataSource to temp and use temp to access data members. What does the CLR do? Does it really make a copy of dataSource or does the optimizer just use dataSource to access .a and .b?

That depends on what dataSource and temp are as far as whether they are local or not, and how they are used.

If dataSource and temp are both local, then it is perfectly possible that either the compiler or the jitter would optimise the assignment away. If they are both local though, they are both local to the same thread, so this isn't going to impact multi-threaded use.

If dataSource is a field (static or instance), then since temp is definitely a local in the code shown (because its initialised in the code fragment shown) the assignment cannot be optimised away. For one thing, grabbing a local copy of a field is in itself a possible optimisation, being faster to do several operations on a local reference than to continually access a field or static. There's not much point having a compiler or jitter "optimisation" that just makes things slower.

Consider what actually happens if you were to not use temp:

var a = dataSource.a;
... // many instructions
var b = dataSource.b;

To access dataSource.a the code must first obtain a reference to dataSource and then access a. Afterwards it obtains a reference to dataSource and then accesses b.

Optimising by not using a local makes no sense, since there's going to be an implicit local anyway.

And there is the simple fact that the fear you have is something considered: After temp = dataSource there's no assumption that temp == dataSource because there could be other threads changing dataSource, so it's not valid to make optimisations predicated on temp == dataSource.*

Really the optimisations you are concerned about are either not relevant or not valid and hence not going to happen.

There is a case that could cause you problems though. It is just about possible for a thread running on one core to not see a change to dataSource made by a thread changing on another core. As such if you have:

/* Thread A */
dataSource = null;

/* Some time has passed */

/* Thread B */
var isNull = dataSource == null;

Then there's no guarantee that just because Thread A had finished setting dataSource to null, that Thread B would see this. Ever.

The memory models in use in .NET itself and in the processors .NET generally runs on (x86 and x86-64) would prevent that happening, but in terms of possible future optimisations, this is something that's possible. You need memory barriers to ensure that Thread A's publishing definitely affects Thread B's reading. lock and volatile are both ways to ensure that.

*One doesn't even need to be multi-threaded for this to not follow, though it is possible to prove in particular cases that there are no single-thread changes that would break that assumption. That doesn't really matter though, because the multi-threaded case still applies.

You state >>>And there is the simple fact that the fear you have is something considered: After temp = dataSource there's no assumption that temp == dataSource because there could be other threads changing dataSource, so it's not valid to make optimisations predicated on temp == dataSource<<< --- Why not? As far as I understand the optimizers are allowed to do whatever possible as long as the meaning of the code doesn't changed out of the perspective of the current thread. The optimizers doesn't know anything about other threads. Otherwise Memory Barriers and volatile were superfluous. — bebo, Sep 16 '15 at 11:51
The compiler and jitter most certainly do consider other threads when it comes to optimisation. Reordering operations may ignore the concerns of other threads, and memory barriers affect that, but the sort of re-writing you are talking about just wouldn't make sense for a .NET compiler or jitter to do. — Jon Hanna, Sep 16 '15 at 12:26

C# Updateing object references and multithreading

1 Answers1