5

Run enumeration of IAsyncEnumerable twice not possible?

Once CountAsync has been run, the await foreach won't enumerate any item. Why? It seems there is no Reset method on the AsyncEnumerator.

var count = await itemsToImport.CountAsync();

await foreach (var importEntity in itemsToImport)
{
    // won't run
}

Source of data:

private IAsyncEnumerable<TEntity> InternalImportFromStream(TextReader reader)
{
    var csvReader = new CsvReader(reader, Config);
        
    return csvReader.GetRecordsAsync<TEntity>();
}
Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
CleanCoder
  • 2,406
  • 1
  • 10
  • 35
  • 7
    Even sync enumerators almost never actually implement `.Reset()`; `IAsyncEnumerator` just codifies the practice that you can't enumerate more than once. As for `IAsyncEnumerable`, the same applies as for `IEnumerable`: whether enumerating more than once is possible is not defined in the interface, but for many sources you can't because there'd be a hidden performance penalty or inconsistent results (like executing a DB query twice). You have to deal with that explicitly, meaning you must either materialize the result (`.ToList()` and related) or redo the operation yourself. – Jeroen Mostert Mar 17 '20 at 15:32
  • If you don't need the count to begin with then you can iterate over each item while incrementing the count by one thus you don't have to bring everything into memory while still obtaining the count once you've processed the collection. – Kieran Devlin Mar 17 '20 at 15:39
  • I need count before the enumeration. IEnumerator.Reset exists, IAsyncEnumerator has nothing comparable. So how do I get the count out of it because I want to display the progress in a form like "process record x of count" instead of "processing record x" – CleanCoder Mar 17 '20 at 15:40
  • This is turning into an XY problem. We need more detail on where the enumeration is coming from to be able to form a conclusion on how you would achieve your goals. i.e where the data source is coming from and what client are you using to access said data source? (Update the question) – Kieran Devlin Mar 17 '20 at 15:42
  • this is a common problem with IAsyncEnumerable. this is not a complicated case – CleanCoder Mar 17 '20 at 15:44
  • 1
    For some sources, you just can't get the count "before" the enumeration (streaming rows from a DB, for example). `.Reset()`, even if it existed, requires that results are buffered somewhere or else that everything is redone. If you want to buffer/redo them yourself, you can, but you can't expect the source to do it for you in the off chance you wanted to wait for a count. Collections implement their own `.Count` properties you can use to get the count "directly"; a similar mechanism could be used for other sources, depending on their nature (like a separate `Count` call on a web API). – Jeroen Mostert Mar 17 '20 at 15:44
  • it would be enough if I could enumerate it twice but without the processing and just counting. this interfaces comes from the loaded data of CsvHelper library. – CleanCoder Mar 17 '20 at 15:47
  • 1
    If the data is actually already loaded (as in, resident in memory) enumerating it asynchronously adds nothing anyway, and you might as well use the sync enumeration (which presumably can be redone). On the other hand, if it is not resident and the async enumeration is used to encapsulate async (file) I/O, which would be much more common, you have to make the conscious choice to read the file twice, and the interface forces you to do that explicitly. – Jeroen Mostert Mar 17 '20 at 15:49
  • thanks for that. I will put it once to an list and then enumerate it syncronously after I used the count value. – CleanCoder Mar 17 '20 at 15:52
  • 2
    As David Browne mentioned it is not because of the `IAsyncEnumerable` but the underlying type that implements it. I'd recommend reading this article. There is an example that works no matter how many times it is being iterated: https://learn.microsoft.com/en-us/archive/msdn-magazine/2019/november/csharp-iterating-with-async-enumerables-in-csharp-8 – Fabjan Mar 17 '20 at 15:52
  • Adding to all the comments, an IAsyncEnumerable represents an *active results stream*. It can go on forever. To enumerate this twice, the *publisher* of the data would have to execute twice - you'd have to call the same HTTP API sequence again, or execute the same query again, or read the same IO stream again. – Panagiotis Kanavos Mar 18 '20 at 14:07

2 Answers2

7

This has nothing to do with resetting an IAsyncEnumerator. This code attempts to generate a second IAsyncEnumerator, which, just like with IEnumerable.GetEnumerator() is only possible on some kinds of collections. If the Enumerable (async or not) is an abstraction over some sort of forward-only data structure, then GetEnumerator/GetAsyncEnumerator will fail.

And even when it doesn't fail, it's sometimes expensive. For instance it might run a database query or hit an remote API each time it's enumerated. This is why IEnumerable/IAsyncEnumerable make poor public return types from functions, as they fail to describe the capabilities of the returned collection, and almost the only thing you can do with the value is materialize it with .ToList/ToListAsync.

Eg, this works fine:

static async IAsyncEnumerable<int> Col()
{
    for (int i = 1; i <= 10; i++)
    {
        yield return i;
    }
}
static void Main(string[] args)
{
    Run().Wait();
}
static async Task Run()
{

    var col = Col();

    var count = await col.CountAsync();
    await foreach (var dataPoint in col)
    {
        Console.WriteLine(dataPoint);
    }
}
David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
1

It seems to be impossible to reset an IAsyncEnumerable by the interface itself, due to the fact, that there is no Reset method on the IAsyncEnumerator interface.

In this specific example the second enumeration won't work because the IAsyncEnumerable targets to a Stream. Once the stream has been read, the position cursor targets the stream's end. if you have control over the stream or a reference to it, (which I don't) you could set the position to 0 again and enumerate it again.

I tend to use ToListAsync and then get count out of its Count property and iterate the items synchronously because they are already loaded.

CleanCoder
  • 2,406
  • 1
  • 10
  • 35
  • 2
    The `IAsyncEnumerator` is intended to be a disposable single use object. This doesn't mean that the `IAsyncEnumerable` can only be enumerated once, because an enumerable is a factory that can create an infinite number of enumerators. You get a brand new enumerator every time you invoke the [`GetAsyncEnumerator`](https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.iasyncenumerable-1.getasyncenumerator) method. There is no guarantee that all these enumerators will produce the same sequence though. – Theodor Zoulias Mar 18 '20 at 00:17
  • 1
    As I said: for a stream it wouldn't work as the stream's position is at its end and needs to be set to 0 again before attempting to iterate it again. – CleanCoder May 28 '21 at 07:54
  • 1
    Sven this depends on the implementation. One can easily implement an `IAsyncEnumerable` that every time it is enumerated it creates a new `CsvReader`, and starts reeding it from position 0. To do it you just need to add the `async` modifier to the `InternalImportFromStream`, and loop and `yield` the elements contained in the `GetRecordsAsync` method. – Theodor Zoulias May 28 '21 at 09:05
  • ...like this: [Pass-through for IAsyncEnumerable?](https://stackoverflow.com/questions/59876417/pass-through-for-iasyncenumerable) – Theodor Zoulias May 28 '21 at 09:10