2

I've been using a method for splitting collections into batches form this answer - https://stackoverflow.com/a/17598878/1012739:

public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size) {
    using (IEnumerator<T> enumerator = source.GetEnumerator())
        while (enumerator.MoveNext())
            yield return TakeIEnumerator(enumerator, size);
}

private static IEnumerable<T> TakeIEnumerator<T>(IEnumerator<T> source, int size) {
    int i = 0;
    do yield return source.Current;
    while (++i < size && source.MoveNext());
}

When iterating over the results of Batch<T> one gets the expected number of collections, but when calling Count or ToList the outer collection length is reported:

var collection = new int[10];
var count = 0;
foreach(var batch in collection.Batch(2))
    ++count;
Assert.AreEqual(5, count); // Passes
// But
Assert.AreEqual(5, collection.Batch(2).Count());        // Fails
Assert.AreEqual(5, collection.Batch(2).ToList().Count); // Fails

How does this work and is the a way to fix it?

zzandy
  • 2,263
  • 1
  • 23
  • 43
  • Try saving the `collection.Batch(2)` in a variable and count that, instead of batching the same collection again. – devcrp Jun 02 '20 at 06:31
  • Batch the same collection 2 or " time in different variable and see if there is something about `GetEnumerator` returning an already completed one. I don't see why it does not work with naked eye compilation. So I just throw the most random thing I could think of and start from there. – Drag and Drop Jun 02 '20 at 06:32
  • 1
    Batching the same collection multiple time I got `Unhandled exception. System.InvalidOperationException: Enumeration already finished.` https://dotnetfiddle.net/1TaDXI – Drag and Drop Jun 02 '20 at 06:41
  • It might help if you initialized the new int array to something more revealing (1 to 10 would be good), then printed out the contents of those batches. As it stands, you might be getting 10 batches of 10, or 10 batches of 1 enumerable, you would never know. – Charlie Jun 02 '20 at 06:46
  • @Charlie, @devcrp `collection` is not the issue here, the code in the question is just for demonstration. As Mark Gravell points our in his answer, the issue is with iterating over outer enumerables first. – zzandy Jun 02 '20 at 07:13

1 Answers1

3

Your TakeIEnumerator<T> method is dependent upon the position of the enumerator (source), and thus is timing dependent... on itself. If the results are iterated by collating the "outer" results first, i.e.

var batches = source.Batch(24).ToList();
// then iterate in any way

then by definition, source is exhausted, and you'll get N items in batches, where N is the number from source, and all the batches will be empty because there is no more data. If, however, the results are iterated depth first, i.e.

foreach (var batch in source) {
    foreach (var item in batch) {...}
}

then you are looking at the open cursor. Ultimately, this approach is inherently brittle and dangerous. IMO your batch method should create buffers of computed data, perhaps a List<T> or similar. This will allocate, but: it'll be reliable. For example:

private static IEnumerable<T> TakeIEnumerator<T>(IEnumerator<T> source, int size) {
    var buffer = new List<T>(size);
    int i = 0;
    do buffer.Add(source.Current);
    while (++i < size && source.MoveNext())
    return buffer;
}
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900