13

What is the correct way to use async/await in a recursive method? Here is my method:

public string ProcessStream(string streamPosition)
{
    var stream = GetStream(streamPosition);

    if (stream.Items.count == 0)
        return stream.NextPosition;

    foreach(var item in stream.Items) {
        ProcessItem(item);
    }

    return ProcessStream(stream.NextPosition)
}

And here is the method with async/await:

public async Task<string> ProcessStream(stringstreamPosition)
{
        var stream = GetStream(streamPosition);

        if (stream.Items.count == 0)
            return stream.NextPosition;

        foreach(var item in stream.Items) {
            await ProcessItem(item); //ProcessItem() is now an async method
        }

        return await ProcessStream(stream.NextPosition);
 }
Prabhu
  • 12,995
  • 33
  • 127
  • 210
  • 1
    Are you seeing any errors with what you tried? – Stephen Cleary Jul 29 '14 at 03:21
  • 1
    @StephenCleary Nope, seems to be working fine. I was just wondering if there is any potential danger. – Prabhu Jul 29 '14 at 03:25
  • 1
    @zerkms How is my code synchronous? With the await keyword, the method should return right away to the caller right? I only started learning async/await today, so there's likely a lot of gaps in my understanding. – Prabhu Jul 29 '14 at 03:30
  • 1
    @Prabhu: yep, it's my mistake – zerkms Jul 29 '14 at 03:33
  • 1
    Looks pretty correct to me. Not sure what you're expecting here. – Marcel N. Jul 29 '14 at 04:16
  • 1
    You're missing some line terminators (`;`), and using your `stream.NextPosition` as if it was a `string` when you return, but `int` when you pass it on to `ProcessStream`. Now that we got that out of the way, this method is trivial to implement as a simple loop, avoiding recursion altogether - and that's a good idea, because in synchronous scenarios (including cases where your recursive `Task`s complete synchronously) recursion can lead to `StackOverflowException`s. *Probably* won't happen in your particular case, but why you'd even take that risk when you can use a simple loop, is beyond me. – Kirill Shlenskiy Jul 29 '14 at 05:46
  • 1
    @KirillShlenskiy thanks for the pointers. I've corrected the code. Would you mind giving me a code snippet for the non-recursive way? – Prabhu Jul 29 '14 at 06:08

2 Answers2

13

While I have to say upfront that the intention of the method is not entirely clear to me, reimplementing it with a simple loop is quite trivial:

public async Task<string> ProcessStream(string streamPosition)
{
    while (true)
    {
        var stream = GetStream(streamPosition);

        if (stream.Items.Count == 0)
            return stream.NextPosition;

        foreach (var item in stream.Items)
        {
            await ProcessItem(item); //ProcessItem() is now an async method
        }

        streamPosition = stream.NextPosition;
    }
}

Recursion is not stack-friendly and if you have the option of using a loop, it's something definitely worth looking into in simple synchronous scenarios (where poorly controlled recursion eventually leads to StackOverflowExceptions), as well as asynchronous scenarios, where, I'll be honest, I don't even know what would happen if you push things too far (my VS Test Explorer crashes whenever I try to reproduce known stack overflow scenarios with async methods).

Answers such as Recursion and the await / async Keywords suggest that StackOverflowException is less of a problem with async due to the way the async/await state machine works, but this is not something I have explored much as I tend to avoid recursion whenever possible.

Community
  • 1
  • 1
Kirill Shlenskiy
  • 9,367
  • 27
  • 39
  • @Krill awesome thanks. Another issue is that the foreach is going to process one after the other because it is awaiting ProcessItem. Is there anyway to speeden it up so that it is' not waiting for ProcessItem. I sort of need to just fire and forget, but if I take out the await, the compiler gives me a warning that not using the await in an async method is not a good idea. – Prabhu Jul 29 '14 at 07:03
  • 1
    @Prabhu, that is highly scenario-specific and depends on multiple factors, i.e. how thread-safe your code is, and how many items you expect `stream.Items` to contain. Thread-safety is paramount, and you need to ensure that multiple `ProcessItems` can run in parallel. Beyond that - if the number of items is not significant but each takes long time to process, I'd be replacing the `foreach` with `await Task.WhenAll(stream.Items.Select(ProcessItem));`. It the number of items is large and processing quick, I'd be reverting back to synchronous processing and using `Parallel.ForEach` instead. – Kirill Shlenskiy Jul 29 '14 at 07:12
  • in my case the number of items is very large. Curious to know why parallel.ForEach (which is what I have for now) will perform better in this case. – Prabhu Jul 29 '14 at 07:16
  • @Prabhu, that's because `Task`s are quite "heavy" and you really don't want one per item when dealing with *large* collections. `Parallel.ForEach` uses smart collection partitioning to achieve parallelisation without too much overhead (not counting cases where you're dealing with *really* small delegate bodies), and provides a number of optional tweaks and optimisations (i.e. local state), which you should look into in order to squeeze every last bit of performance out of it. – Kirill Shlenskiy Jul 29 '14 at 07:20
  • Thank you. Could I do a fire and forget on ProcessItem(item) somehow? – Prabhu Jul 29 '14 at 07:23
  • @Prabhu, you can, but why would you? Now you'll have all these parallel tasks running around flooding your thread pool and what not, throwing unobserved exceptions all over the place. Not good. Do you want to do this because you're trying to speed up the loop, or is it because `GetStream` can take a while and you ultimately want the producer (`GetStream`) and consumer (`foreach -> ProcessItem`) running in parallel? – Kirill Shlenskiy Jul 29 '14 at 07:31
  • Yeah I want to speed up my loop. While the for loop is running its work in parallel, the while loop is going to wait each iteration until the for loop is done doing all the ProcessItem()s, hence my concern. – Prabhu Jul 29 '14 at 07:42
  • I don't have the code for your producer OR your consumer, so this is a wild stab in the dark, but if the whole synchronous producer-consumer situation is bothering you and you think that the two can be parallelised nicely, then look into TPL Dataflow, or a simple pipeline (very basic example here: http://stackoverflow.com/questions/24467373/c-sharp-parallel-library-xmlreader-xmlwriter/24468103#24468103 , fully explained here: http://msdn.microsoft.com/en-us/library/ff963548.aspx). – Kirill Shlenskiy Jul 29 '14 at 08:44
2

When I add code to make your example more concrete, I find two possible ways for the recursion to turn out badly. Both of them assume that your data is pretty big and require specific conditions to trigger.

  1. If ProcessItem(string) returns a Task that completes before it is awaited on (or, I assume, it completes before the await finishes spinning), the continuation will execute synchronously. In my code below, I have simulated this by having ProcessItem(string) return Task.CompletedTask. When I do this, the program very quickly terminates with a StackOverflowException. This is because .net’s TPL “Releases Zalgo” by opportunistically executing continuations synchronously without regard to how much space is available in the current stack. That means that it will exacerbate the potential stack space issue that you already have by using a recursive algorithm. To see this, comment out await Task.Yield(); in my code sample below.
  2. If you use some technique to prevent TPL from continuing synchronously (below I use Task.Yield()), eventually the program will run out of memory and die with an OutOfMemoryException. If I understand correctly, this wouldn’t happen if return await were able to emulate the tail-call optimization. I imagine that what is happening here is each call generates something like a book-keeping Task<string> and keeps generating them even though they could be coalesced. To reproduce this error with the sample below, ensure you’re running the program as 32-bit, disable the Console.WriteLine() call (because consoles are really slow), and ensure the await Task.Yield() is uncommented.
using System;
using System.Collections.Generic;
using System.Threading.Tasks;

// Be sure to run this 32-bit to avoid making your system unstable.
class StreamProcessor
{
    Stream GetStream(string streamPosition)
    {
        var parsedStreamPosition = Convert.ToInt32(streamPosition);
        return new Stream(
            // Terminate after we reach 0.
            parsedStreamPosition > 0 ? new[] { streamPosition, } : new string[] { },
            Convert.ToString(parsedStreamPosition - 1));
    }

    Task ProcessItem(string item)
    {
        // Comment out this next line to make things go faster.
        Console.WriteLine(item);
        // Simulate the Task represented by ProcessItem finishing in
        // time to make the await continue synchronously.
        return Task.CompletedTask;
    }

    public async Task<string> ProcessStream(string streamPosition)
    {
        var stream = GetStream(streamPosition);

        if (stream.Items.Count == 0)
            return stream.NextPosition;

        foreach (var item in stream.Items)
        {
            await ProcessItem(item); //ProcessItem() is now an async method
        }

        // Without this yield (which prevents inline synchronous
        // continuations which quickly eat up the stack),
        // you get a StackOverflowException fairly quickly.
        // With it, you get an OutOfMemoryException eventually—I bet
        // that “return await” isn’t able to tail-call properly at the Task
        // level or that TPL is incapable of collapsing a chain of Tasks
        // which are all set to resolve to the value that other tasks
        // resolve to?
        await Task.Yield();

        return await ProcessStream(stream.NextPosition);
    }
}

class Program
{
    static int Main(string[] args) => new Program().Run(args).Result;
    async Task<int> Run(string[] args)
    {
        await new StreamProcessor().ProcessStream(
            Convert.ToString(int.MaxValue));
        return 0;
    }
}

class Stream
{
    public IList<string> Items { get; }
    public string NextPosition { get; }
    public Stream(
        IList<string> items,
        string nextPosition)
    {
        Items = items;
        NextPosition = nextPosition;
    }
}

So, I guess my two recommendations are:

  1. Use Task.Yield() if you aren’t certain that the stack growth of recursion will be interrupted by something else.
  2. As suggested already, avoid recursion if it doesn’t even make sense for your problem in the first place. And even if it makes a clean algorithm, avoid it if your problem size is unbounded.
binki
  • 7,754
  • 5
  • 64
  • 110