3

I have a tricky problem that is turning up in some of my code. I have a cache manager that either returns items from the cache or calls a delegate to create them (expensively).

I'm finding that I'm having problems with the finalize part of my method being run on a different thread than the rest.

Here's a cut down version

public IEnumerable<Tuple<string, T>> CacheGetBatchT<T>(IEnumerable<string> ids, BatchFuncT<T> factory_fn) where T : class
    {

        Dictionary<string, LockPoolItem> missing = new Dictionary<string, LockPoolItem>();

        try
        {
            foreach (string id in ids.Distinct())
            {
                LockPoolItem lk = AcquireLock(id);
                T item;

                item = (T)resCache.GetData(id); // try and get from cache
                if (item != null)
                {
                    ReleaseLock(lk);
                    yield return new Tuple<string, T>(id, item);
                }
                else
                    missing.Add(id, lk);                    
            }

            foreach (Tuple<string, T> i in factory_fn(missing.Keys.ToList()))
            {
                resCache.Add(i.Item1, i.Item2);
                yield return i;
            }

            yield break;                        // why is this needed?
        }
        finally
        {
            foreach (string s in missing.Keys)
            {
                ReleaseLock(l);
            }
        }
    }

Acquire and Release lock fill a dictionary with LockPoolItem objects that have been locked with Monitor.Enter / Monitor.Exit [I have also tried mutexes]. The problem is coming when ReleaseLock is being called on a different thread from the one AcquireLock was called on.

The problem comes when calling this from another function that uses threads sometimes the finalize block gets called, due to the disposal of the IEnumerator running on the returned iterate.

The following block is a simple example.

BlockingCollection<Tuple<Guid, int>> c = new BlockingCollection<Tuple<Guid,int>>();

            using (IEnumerator<Tuple<Guid, int>> iter = global.NarrowItemResultRepository.Narrow_GetCount_Batch(userData.NarrowItems, dicId2Nar.Values).GetEnumerator()) {
                Task.Factory.StartNew(() => {

                    while (iter.MoveNext()) {
                        c.Add(iter.Current);
                    }
                    c.CompleteAdding();
                });
            }

This doesn't seem to happen when I add the yield break - however I'm finding this hard to debug as it only occurs very occasionally. However, it does happen - I've tried logging the thread ids and finalize if getting called on different threads...

I'm sure this can't be correct behaviour: I can't see why the dispose method (i.e. exit using) would get called on a different thread.

Any ideas how to guard against this?

svick
  • 236,525
  • 50
  • 385
  • 514
  • I'd suggest that any design where you're holding locks at the point at which you yield is a broken one - you have no idea how long it will be before your caller next calls `MoveNext` or, indeed, as you've found, `Dispose`. Without knowing more about your specific problem, it's difficult to offer concrete advice, but that's where I'd be looking - change the design so that you're not at your callers mercy for when you release locks. – Damien_The_Unbeliever Apr 24 '15 at 14:01
  • That is a fair point, however it doesn't answer the question. What I am trying to achieve is to have the provider return items as they are retrieved from the slow store - some items may take seconds, others milliseconds but there is no way of knowing before hand which ones in the batch will be slow to return. I suspect I would be better having the blocking collection provided to the cache function and filling that there. However I still don't understand why dispose/finalize is being called on a different thread. – dominicbeesley Apr 27 '15 at 08:50
  • No, it doesn't, hence why it's posted as a comment. If you're open to making such changes, I could put some effort in and show you what the alternative might look like. If you check the discussion I've had below supercat's answer you'll see I've been arguing for several days that there is no guarantee that an enumerable will resume on the same thread (especially if the calling code uses modern features such as `async`) – Damien_The_Unbeliever Apr 27 '15 at 08:54
  • Thanks Damien, I've got plenty of ideas of how to improve this I just could not work out why Dispose/finalize was getting called on a different thread when it was in a using block. I'm about to bin and restart my cacheing model at the moment. – dominicbeesley Apr 27 '15 at 10:21

3 Answers3

1

There seems to be a race here.

It looks like your calling code creates the enumerator, then starts a task on the thread pool to enumerate through it, then disposes of the enumerator. My initial thoughts:

  • If the enumerator is disposed before it enumeration starts, nothing will happen. From a brief test, this doesn't prevent enumeration after it's disposed.

  • If the enumerator is disposed while enumerating, the finally block will be called (on the calling thread) and enumeration will stop.

  • If enumeration is completed by the task action, the finally block will be called (on the thread pool thread).

To attempt to demonstrate, consider this method:

private static IEnumerable<int> Items()
{            
    try
    {
        Console.WriteLine("Before 0");

        yield return 0;

        Console.WriteLine("Before 1");

        yield return 1;

        Console.WriteLine("After 1");
    }
    finally 
    {
        Console.WriteLine("Finally");
    }
}

If you dispose before enumerating, nothing will be written to the console. This is what I suspect you will be doing most of the time, as the current thread reaches the end of the using block before the task begins:

var enumerator = Items().GetEnumerator();
enumerator.Dispose();    

If enumeration completes before Dispose, the final call to MoveNext will invoke the finally block.

var enumerator = Items().GetEnumerator();
enumerator.MoveNext();
enumerator.MoveNext();
enumerator.MoveNext();

Result:

"Before 0"
"Before 1"
"After 1"
"Finally"

If you dispose while enumerating, the call to Dispose will call the finally block:

var enumerator = Items().GetEnumerator();
enumerator.MoveNext();
enumerator.Dispose();

Result:

"Before 0"
"Finally"

I'd suggest you create, enumerate and dispose of the enumerator on the same thread.

Charles Mager
  • 25,735
  • 2
  • 35
  • 45
  • But I thought that the Dispose should be called at the end of the "using" block - on the outer thread. Are you saying that .MoveNext() can implicitly call Dispose? I was assuming that Dispose was called at the point where the closing } of the using block is i.e. on the original outer thread. – dominicbeesley Apr 27 '15 at 08:58
  • `Dispose` will be called at the end of the `using` block, the problem is the execution will probably reach that point before the code in the `Task` has even started to execute. `MoveNext` may call the `finally` block as a result of there being nothing else to do, else the `Dispose` call will if called before enumeration has finished. – Charles Mager Apr 27 '15 at 09:08
  • @dominicbeesley I've updated the answer to hopefully clarify. – Charles Mager Apr 27 '15 at 09:28
  • Thank you Charles, that now makes sense - I can't believe I hadn't got my head round it but now it has been pointed out it is the only way it could work - doh! I tried to up-vote you but my rep is too low. – dominicbeesley Apr 27 '15 at 10:23
0

Thanks to all the replies I realised what was going on and why. The solution to my problem was then quite easy. I just had to ensure everything was called on the same thread.

        BlockingCollection<Tuple<Guid, int>> c = new BlockingCollection<Tuple<Guid,int>>();

        Task.Factory.StartNew(() => {
            using (IEnumerator<Tuple<Guid, int>> iter = global.NarrowItemResultRepository.Narrow_GetCount_Batch(userData.NarrowItems, dicId2Nar.Values).GetEnumerator()) {

                while (iter.MoveNext()) {
                    c.Add(iter.Current);
                }
                c.CompleteAdding();
            }
        });
-1

The term "finalizer" relates to a concept totally unrelated to a "'finally' block"; nothing about the threading context of finalizers is guaranteed, but I think you're actually interested in "finally" blocks.

A finally block surrounded by a yield return will be executed by whatever thread calls Dispose on the iterator's enumerator. Enumerators are generally entitled to assume that all operations performed on them, including Dispose, will be done by the same thread that created them, and are generally under no obligation to behave in anything even remotely resembling a sensible fashion in cases where they are not. The system does not prevent code from using enumerators on multiple threads, but in the event that a program uses from multiple threads an enumerator which makes no promise that it will work in that regard means that any consequences resulting therefrom are not the fault of the enumerator, but rather the fault of the program that used it illegitimately.

Generally, it is good for classes to include enough protection against invalid multi-threading to ensure that improper multi-threaded use will not cause security vulnerabilities, but not worry about preventing any other sort of harm or confusion.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • I would question your assumptions here. Do you have any specific guidelines or documentation to back them up? Why do you assert that enumerators are allowed to assume access from a single thread? – Damien_The_Unbeliever Apr 24 '15 at 18:16
  • What I'm thinking about here is, specifically, `async` methods - there's no guarantee (in general) that all code within that method will all execute on the same thread, and yet I'm not aware of *any* guidance that says "do not use `foreach` within async methods" – Damien_The_Unbeliever Apr 24 '15 at 18:28
  • @Damien_The_Unbeliever: Using a `foreach` entirely within an async method would by my understanding cause `GetEnumerator` to be called from an unknown threading context, but would cause all subsequent actions on that enumerator to be performed in that same threading context. – supercat Apr 24 '15 at 18:29
  • @Damien_The_Unbeliever: Generally, references of type `IEnumerator` aren't passed around much; `IEnumerable` is passed around instead. While it might possibly in some rare cases be useful be pass an `IEnumerator` into an async method and ensure that it's done with it before anything else tries to use it (many Framework enumerators are definitely *not* thread-safe for simultaneous access) I don't see that as being a very valuable ability; if such usage were anticipated, I don't think the language would allow `yield return` within a `lock`. – supercat Apr 24 '15 at 18:33
  • We're not talking about *simultaneous* access, we're talking about access from different threads, at separate (well defined) times. – Damien_The_Unbeliever Apr 24 '15 at 18:37
  • @Damien_The_Unbeliever: That's the scenario I described as being maybe possibly vaguely useful; a `yield return` within a `lock` would break under that scenario, and I don't think the language authors would have allowed such a thing if non-simultaneous access by separate threads was intended. – supercat Apr 24 '15 at 18:39
  • Maybe check [this answer](http://stackoverflow.com/a/7612714/15498): "it is also a "worst practice" to do a `yield return` inside a lock, for the same reason. It is legal to do so, but I wish we had made it illegal. We're not going to make the same mistake for "await"." – Damien_The_Unbeliever Apr 24 '15 at 19:03
  • @Damien_The_Unbeliever: I'm not saying a `yield return` inside a lock is a good idea; my point was that the concept is *meaningless* without an expectation that all accesses to the enumerator (not necessarily the enumerable) will be done via the same thread. Making the C# compiler allow a `yield return` within a `lock` presumably took some work; I doubt the compiler people would have expended such effort if the generated code couldn't be expected to be meaningful. – supercat Apr 25 '15 at 17:55
  • I just wrote a simple enumerable that returns the integers 1, 2 and 3, reporting which thread it's on in between each `yield`. I then wrote a `main` method that `Thread.Runs` an `async` method with the following `foreach` loop. It reported different thread IDs. Demonstrating that `foreach` within `async` doesn't do anything to maintain thread affinity. ` foreach (var x in new ExEnum()) { Console.WriteLine("Got {0}", x); await Task.Delay(10); }` – Damien_The_Unbeliever Apr 26 '15 at 09:03
  • @Damien_The_Unbeliever: If the iterator had printed out messages before and after a `Thread.Sleep(1000);`, what would have been the effect? What exactly is the system trying to overlap with what? I've never used async coding, but I would have expected that not that the system would "do something to maintain thread affinity", but rather that if the goal is to overlap the processing of an item with the fetching of the next, then the processing should be done on a different thread while the fetching of the next should say on the present thread. I'm a bit surprised the system isn't doing that. – supercat Apr 26 '15 at 23:06
  • @Damien_The_Unbeliever: Can you explain what would happen if the iterator had taken any significant amount of time to yield each item? Given that `IEnumerable` is a lot older than async methods, and that usage patterns which would guarantee non-simultaneous actions on multiple threads have historically been exceptionally rare, can you point to any evidence that would suggest that `IEnumerator` implementations were not supposed to assume a consistent threading context? – supercat Apr 27 '15 at 21:06