2

I have two loops that use SemaphoreSlim and a array of strings "Contents"

a foreachloop:

        var allTasks = new List<Task>();
        var throttle = new SemaphoreSlim(10,10);
        foreach (string s in Contents)
        {
            await throttle.WaitAsync();
            allTasks.Add(
                Task.Run(async () =>
                {
                    try
                    {
                        rootResponse.Add(await POSTAsync(s, siteurl, src, target));
                    }
                    finally
                    {
                        throttle.Release();
                    }
                }));
        }
        await Task.WhenAll(allTasks);

a for loop:

        var allTasks = new List<Task>();
        var throttle = new SemaphoreSlim(10,10);
        for(int s=0;s<Contents.Count;s++)
        {
            await throttle.WaitAsync();
            allTasks.Add(
                Task.Run(async () =>
                {
                    try
                    {
                        rootResponse[s] = await POSTAsync(Contents[s], siteurl, src, target);
                    }
                    finally
                    {
                        throttle.Release();
                    }
                }));
        }
        await Task.WhenAll(allTasks);

the first foreach loop runs well, but the for loops Task.WhenAll(allTasks) returns a OutOfRangeException and I want the Contents[] index and List index to match.

Can I fix the for loop? or is there a better approach?

H.Matthew
  • 69
  • 8
  • 2
    The problem is the behavior of variable capture of foreach loops and for loops is different. An easy fix is to declare a new variable `int sTemp = s;` and capture that in the anonymous function. – Mike Zboray Oct 21 '18 at 03:22
  • Is there a web page explaining that difference? I'm not that familiar with foreach loops and I want to learn the detailed reason behind – H.Matthew Oct 21 '18 at 03:28
  • 1
    I suppose it is complicated by the fact that C# 5 changed the behavior of foreach loop variables to be more natural, but that introduced a breaking change and a conceptual difference between for and foreach loops. There are many questions on the site related to [this](https://stackoverflow.com/q/271440/517852). The linked question has links to other articles discussing the issue. – Mike Zboray Oct 21 '18 at 03:41
  • So I declared a int inside the function but still got the Exception, is there another method where I can add to the List with the same index as the Content[] with the foreach loop? – H.Matthew Oct 21 '18 at 03:58

1 Answers1

2

This would fix your current problems

for (int s = 0; s < Contents.Count; s++)
{
   var content = Contents[s];

   allTasks.Add(
      Task.Run(async () =>
                  {
                     await throttle.WaitAsync();
                     try
                     {
                        rootResponse[s] = await POSTAsync(content, siteurl, src, target);
                     }
                     finally
                     {
                        throttle.Release();
                     }
                  }));
}
await Task.WhenAll(allTasks);

However this is a fairly messy and nasty piece of code. This looks a bit neater

public static async Task DoStuffAsync(Content[] contents, string siteurl, string src, string target)
{
   var throttle = new SemaphoreSlim(10, 10);

   // local method
   async Task<(Content, SomeResponse)> PostAsyncWrapper(Content content)
   {
      await throttle.WaitAsync();
      try
      {
         // return a content and result pair
         return (content, await PostAsync(content, siteurl, src, target));
      }
      finally
      {
         throttle.Release();
      }   
   }

   var results = await Task.WhenAll(contents.Select(PostAsyncWrapper));

   // do stuff with your results pairs here
}

There are many other ways you could do this, PLinq, Parallel.For,Parallel.ForEach, Or just tidying up your captures in your loops like above.

However since you have an IO bound work load, and you have async methods that run it. The most appropriate solution is the async await pattern which neither Parallel.For,Parallel.ForEach cater for optimally.

Another way is TPL DataFlow library which can be found in the System.Threading.Tasks.Dataflow nuget package.

Code

public static async Task DoStuffAsync(Content[] contents, string siteurl, string src, string target)
{

   async Task<(Content, SomeResponse)> PostAsyncWrapper(Content content)
   {
      return (content, await PostAsync(content, siteurl, src, target));
   }

   var bufferblock = new BufferBlock<(Content, SomeResponse)>();
   var actionBlock = new TransformBlock<Content, (Content, SomeResponse)>(
      content => PostAsyncWrapper(content),
      new ExecutionDataflowBlockOptions
         {
            EnsureOrdered = false,
            MaxDegreeOfParallelism = 100,
            SingleProducerConstrained = true
         });
   actionBlock.LinkTo(bufferblock);

   foreach (var content in contents)
      actionBlock.Post(content);

   actionBlock.Complete();
   await actionBlock.Completion;

   if (bufferblock.TryReceiveAll(out var result))
   {
      // do stuff with your results pairs here   
   }

}

Basically this creates a BufferBlock And TransformBlock, You pump your work load into the TransformBlock, it has degrees of parallel in its options, and it pushes them into the BufferBlock, you await completion and get your results.

Why Dataflow? because it deals with the async await, it has MaxDegreeOfParallelism, it designed for IO bound or CPU Bound workloads, and its extremely simple to use. Additionally, as most data is generally processed in many ways (in a pipeline), you can then use it to pipe and manipulate streams of data in sequence and parallel or in any way way you choose down the line.

Anyway good luck

TheGeneral
  • 79,002
  • 9
  • 103
  • 141
  • Thank you, but I was checking out TPL DataFlow it did not contain EnsureOrdered by default how can I add this? – H.Matthew Oct 21 '18 at 06:51
  • 1
    @H.Matthew you added the old Dataflow nuget, you need to add the new one System.Threading.Tasks.Dataflow nuget and not Microsoft.Tpl.Dataflow – TheGeneral Oct 21 '18 at 06:58