2

I'm doing a POC to split a List of strings into batches and process each batch asynchronously. But when I run the program, it always takes the first set of items (that's 3 as per the batch size). So could anyone please help me how to move to the next set of items. Take is an extension method that I have written. And I tried using async/await pattern for it.

Thanks in advance

public class Program
{
    public static async Task Main(string[] args)
    {
        var obj = new Class1();
        List<string> fruits = new()
            {
                "1",
                "2",
                "3",
                "4",
                "5",
                "6",
                "7",
                "8",
                "9",
                "10"
            };
        
        await Class1.Start(fruits);
        Console.ReadLine();
    }
}

public class Class1
{
    private const int batchSize = 3;
    public static async Task Start(List<string> fruits)
    {
        if (fruits == null)
            return;

        var e = fruits.GetEnumerator();
        while (true)
        {    
            var batch = e.Take(3); // always taking the first 3 items and not moving to the next items of the list
            if (batch.Count == 0)
            {
                break;
            }
            await StartProcessing(batch);
        }
    }

    public static async Task StartProcessing(List<string> batch)
    {
        await Parallel.ForEachAsync(batch, async (item, CancellationToken) =>
        {
            var list = new List<string>();
            await Task.Delay(1000);
            Console.WriteLine($"Fruit Name: {item}");
            list.Add(item);
        });
    }
}

Extension.cs

public static class Extensions
        {
            public static List<T> Take<T>(this IEnumerator<T> e, int num)
            {
                List<T> list = new List<T>(num);
                int taken = 0;
                while (taken < num && e.MoveNext())
                {
                    list.Add(e.Current);
                    taken++;
                }

                return list;
            }
}
Sigmarod
  • 85
  • 12
Vijay Vj
  • 347
  • 3
  • 15
  • 4
    You need combination of `.Skip()` and `.Take()`. No need to use `async/await`. I hope you will get some clue from this link -> https://stackoverflow.com/a/63682395/6299857 – Prasad Telkikar Apr 19 '23 at 06:41
  • 1
    I'd suggest a Unit Test for the extension method to verify it actually does what you expect it to do. – Fildor Apr 19 '23 at 06:51
  • 7
    Is there any reason that you don't use the [`Chunk`](https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.chunk) LINQ operator? – Theodor Zoulias Apr 19 '23 at 06:51
  • Does this answer your question? [Split a List into smaller lists of N size](https://stackoverflow.com/questions/11463734/split-a-list-into-smaller-lists-of-n-size) – Charlieface Apr 19 '23 at 12:57
  • @Heinzi This is a case where OP doesn't understand what they are actually trying to do (XY problem). OP says *"split a List of strings into batches and process each batch asynchronously. But when ran the program, it always takes first set of items(that's 3 as per the batch size). So could anyone please help me how to move to the next set items"* which means `Chunk` is the right answer. You even say so yourself in your answer. – Charlieface Apr 19 '23 at 13:19
  • @Charlieface: I see your point, but I prefer to take the question in the title at face value and assume that OP wants to learn *why* their code is broken (rather than how to replace it with something completely else). Maybe I'm too idealistic, but I like the thought that people come here to *learn*, not only to *solve problems*, and I try to treat them that way. – Heinzi Apr 19 '23 at 13:20

1 Answers1

6

List<T>.Enumerator is a struct. Thus, a copy of your enumerator is modified in your Take extension method. Here is a simpler example using your extension method (fiddle):

using System;
using System.Collections.Generic;

public class Program
{
    public static void Main()
    {
        List<string> fruits = new() { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };
        
        var e = fruits.GetEnumerator();
        var firstThree = e.Take(3);
        var nextThree = e.Take(3);
        
        // prints 1, 2, 3
        foreach (var x in firstThree)
            Console.WriteLine(x);

        // also prints 1, 2, 3
        foreach (var x in nextThree)
            Console.WriteLine(x);
    }
}

public static class Extensions
{
    public static List<T> Take<T>(this IEnumerator<T> e, int num)
    {
        List<T> list = new List<T>(num);
        int taken = 0;
        while (taken < num && e.MoveNext())
        {
            list.Add(e.Current);
            taken++;
        }

        return list;
    }
}

You can fix this by making sure that e contains a boxed enumerator by replacing

var e = fruits.GetEnumerator();

with

IEnumerable<string> e = fruits.GetEnumerator();

(fiddle)


Alternatively, newer versions of C# allow you to use ref extension methods, which would enable you to do something like this (fiddle):

var e = fruits.GetEnumerator();
    
// For some reason generic type inference won't work here
var firstThree = e.Take<string, List<string>.Enumerator>(3);
var nextThree = e.Take<string, List<string>.Enumerator>(3);

...

public static class Extensions
{
    public static List<T> Take<T, TEnum>(ref this TEnum e, int num)
        where TEnum : struct, IEnumerator<T>
    {
        ...
    }
}

But, honestly, the real reason why your code does not work is because enumerators aren't meant to be used like this. The built-in Enumerable.Take method works on Enumerables, not on Enumerators, and that's the idiomatic way to do those things in .NET.

For your use case, Enumerable.Chunk is the most appropriate built-in method. If you want to see how it could be implemented from scratch for educational purposes, have a look at these related questions:

Heinzi
  • 167,459
  • 57
  • 363
  • 519