4

I believe that I understand what a closure is for an anonymous function and am familiar with the traditional pitfalls. Good questions covering this topic are here and here. The purpose is not to understand why or how this works in a general sense but to suss out intricacies I may be unaware of when depending on the behavior of generated closure class references. Specifically, what pitfalls exist when reporting on the behavior of an externally modified variable captured in a closure?

Example

I have a long-running, massively concurrent worker service that has exactly one error case - when it cannot retrieve work. The degree of concurrency (number of conceptual threads to use) is configurable. Note, conceptual threads are implemented as Tasks<> via the TPL. Because the service constantly loops trying to get work when multiplied by the unknown degree of concurrency this can mean thousands to tens of thousands of errors could be generated per second.

As such, I need a reporting mechanism that is time-bound rather than attempt-bound, that is isolated to its own conceptual thread, and that is cancellable. To that end, I devised a recursive Task lambda that accesses my fault counter every 5 minutes outside of the primary attempt-based looping that is trying to get work:

var faults = 1;
Action<Task> reportDelay = null;
reportDelay =
    // 300000 is 5 min
    task => Task.Delay(300000, cancellationToken).ContinueWith(
        subsequentTask =>
        {
            // `faults` is modified outside the anon method
            Logger.Error(
                $"{faults} failed attempts to get work since the last known success.");
            reportDelay(subsequentTask);
        },
        cancellationToken);

// start the report task - runs concurrently with below
reportDelay.Invoke(Task.CompletedTask);

// example get work loop for context
while (true)
{
    object work = null;
    try
    {
        work = await GetWork();
        cancellationToken.Cancel();
        return work;
    }
    catch
    {
        faults++;
    }        
}

Concerns

I understand that, in this case, the generated closure with point by reference to my faults variable (which is incremented whenever any conceptual thread attempts to get work but can't). I likewise understand that this is generally discouraged, but from what I can tell only because it leads to unexpected behaviors when coded expecting the closure to capture a value.

Here, I want and rely on the closure capturing the faults variable by reference. I want to report the value of the variable around the time the continuation is called (it does not have to be exact). I am mildly concerned about faults being prematurely GC'd but I cancel the loop before exiting that lexical scope making me think it should be safe. Is there anything else I'm not thinking of? What dangers are there when considering closure access outside of mutability of the underlying value?

Answer and Explanation

I have accepted an answer below that refactors the code to avoid the need for closure access by reifying the fault monitor into its own class. However, because this does not answer the question directly, I will include a brief explanation here for future readers of the reliable behavior:

So long as the closed-over variable remains in scope for the life of the closure, it can be relied upon to behave as a true reference variable. The dangers of accessing a variable modified in an outer scope from within a closure are:

  • You must understand that the variable will behave as a reference within the closure, mutating its value as it is modified in the outer scope. The closure variable will always contain the current runtime value of the outer scope variable, not the value at the time the closure is generated.
  • You must write your program in such a way as to garuantee that the lifetime of the exterior variable is the same or greater than the anonymous function/closure itself. If you garbage collect the outer variable then the reference will become an invalid pointer.
Community
  • 1
  • 1
Daniel King
  • 224
  • 1
  • 10
  • Why are you creating an anonymous function only to invoke it once in the first place? Just run the code without using an anonymous function. – Servy Jan 30 '17 at 14:50
  • Because a) its a loop so its "invoked" once but runs continuously until cancelled and b) without reference access to that `fault` variable the report mechanism is useless. Since the primary loop is _attempt-bound_ (that is, it loops when an attempt fails) the report loop needs to be somehow independent (otherwise it will likewise be attempt-bound) and it also requires a reference to that variable to include it in the Error. – Daniel King Jan 30 '17 at 15:08
  • @Servy, he calls that function is some sense recursively, not just one time. – Evk Jan 30 '17 at 15:08
  • @Evk Yeah, I see that now. It'd be way simpler with just a loop though. – Servy Jan 30 '17 at 15:11
  • I added an example "get work loop" for context so you can see how they interact. The idea here is that the "main" loop is _attempt-based_ and the "report" loop should run concurrently with that and be _time-based_. – Daniel King Jan 30 '17 at 15:16
  • @DanielKing So why not write the "time loop" using a similar technique as the other loop, rather than using a much more contrived mechanism to do exactly the same thing, creating exactly the problem that you're worried about in the process? – Servy Jan 30 '17 at 15:18
  • And how would I do that and run them concurrently? TPL as far as I know does not have a clear mechanism for running parallel loops that involve different looping strategies. The first _is_ a loop, it's just written anonymously to allow concurrent execution. – Daniel King Jan 30 '17 at 15:20
  • Well you *don't* need to have two different loops. You could trivially write this into a single loop, and in fact I would do so, but even if you wanted separate loops, it's as simple as calling an `async` method and not awaiting it (right away) to run them in parallel. – Servy Jan 30 '17 at 15:43

2 Answers2

1

Here is a quick alternative that avoids some of the issues you may be concerned with. Also, as @Servy mentioned just calling a sperate async function will do. The ConcurrentStack just makes it easy to add and clear, additionally more information could be logged than just the count.

public class FaultCounter {

    private ConcurrentStack<Exception> faultsSinceLastSuccess;        

    public async void RunServiceCommand() {
        faultsSinceLastSuccess = new ConcurrentStack<Exception>();
        var faultCounter = StartFaultLogging(new CancellationTokenSource());
        var worker = DoWork(new CancellationTokenSource());
        await Task.WhenAll(faultCounter, worker);
        Console.WriteLine("Done.");
    }

    public async Task StartFaultLogging(CancellationTokenSource cts) {
        while (true && !cts.IsCancellationRequested) {
            Logger.Error($"{faultsSinceLastSuccess.Count} failed attempts to get work since the last known success.");
            faultsSinceLastSuccess.Clear();
            await Task.Delay(300 * 1000);
        }
    }

    public async Task<object> DoWork(CancellationTokenSource cts) {            
        while (true) {
            object work = null;
            try {
                work = await GetWork();
                cts.Cancel();
                return work;
            }
            catch (Exception ex) {
                faultsSinceLastSuccess.Push(ex);
            }
        }
    }
} 
JSteward
  • 6,833
  • 2
  • 21
  • 30
  • I like this concept. Reifying the FaultCounter to its own class does seem to address the need for an anonymous method quite nicely though I am concerned about the memory pressure the concurrent stack of exceptions would create in my situation. Will go down this road and see if I can't make this work. – Daniel King Jan 30 '17 at 18:24
  • 1
    I accepted this as the answer because this is the avenue I ended up pursuing. It seemed better to avoid relying on modified closure access by reference simply because it behaves against intuition. However this does not truly answer the question "can I rely on modified closure access to behave as expected?" That answer, is, for all intents and purposes, yes so long as you expect it to behave as a reference and not a value. I will modifiy the original post to clarify that so you get the answer points but future readers aren't left wondering. – Daniel King Feb 07 '17 at 16:08
0

I see some issues here in your solution:

  1. You read/write the faults variable value in non-thread-safe manner, so in theory either of your threads could use it's old value. You can fix that with Interlocked class usage, especially for the incrementing.
  2. Your action doesn't looks like dealing with task parameter, so why do you need it as an Action accepting the Task? Also, in continuation you aren't checking the token's cancellation flag, so, in theory again, you may get the situation your code runs smoothly, but you still get the error emails.
  3. You start the long task without long-running flag, which is unfriedly for the task scheduler.
  4. Your recursive action could be rewritten in while loop instead, removing the unnecessary overhead in your code.

Closures in C# are implemented into a compiler generated class, so the GC shouldn't be a concern for you, as long as you're looping your retry code.

VMAtm
  • 27,943
  • 17
  • 79
  • 125
  • 1. Yes I'm aware of the interlocked issue. ATM this is limited to execution on a single thread so it isn't an issue but if I ever moved to multi-threading the retry then I would have to address that. 2. The action takes a task to enable the loop. See `reportDelay(subsequentTask)`. 3. Yes the code is rough and in my finalized implementation would use LongRunning 4. You are right. I think the solution is to reify the retry logic into its own object and write separate looping async methods. – Daniel King Jan 31 '17 at 17:45
  • > 2. The action takes a task to enable the loop. See reportDelay(subsequentTask) - it takes `subsequentTask`, not the `task` – VMAtm Jan 31 '17 at 17:49
  • I see what you mean. Yes it could simple be `Action reportDelay` and later call `reportDelay()` to initiate the tail recursion. As I said, rough code :) – Daniel King Jan 31 '17 at 18:28