0

I am using httptrigger function in dotnet core where i am getting httprequest data in Json format.I need to insert this value in Google Merchant center account. There are almost 9000 rows (dynamic data each time) that needs to be inserted. How i can implement the Parallel.for logic which will execute faster. Currently i am using for each loop like below but it is taking more time. Below is the code.

string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
dynamic body = JsonConvert.DeserializeObject(requestBody);
for (int i =0;i<body.Count;i++)
{
  Product newProduct = InsertProduct(merchantId, websiteUrl,body[i]);
}
Peter Csala
  • 17,736
  • 16
  • 35
  • 75
shashank shekhar
  • 132
  • 2
  • 15
  • Have you tried replacing the `for (int i = 0; i < body.Count; i++) { ... }` with `Parallel.For(0, body.Count, i => { ... });`? – Theodor Zoulias May 23 '21 at 06:52
  • did you mean,I should write like this.will it be automatically aschronoyous call or i have to define async ? Parallel.For(0, body.Count, i => {Product newProduct = InsertProduct(merchantId, websiteUrl,body[i])}); How and where will i define max degree of parallelism? – shashank shekhar May 23 '21 at 07:11
  • Are you familiar with [asynchronous programming](https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/) to begin with? If not, you'd better forget about async/await and try to do the work synchronously. Async/await has some traps, and if you are not familiar with it it's more than likely that you'll fall into at least one of them. To configure the degree of parallelism you create a `ParallelOptions` object, you set its `MaxDegreeOfParallelism` property, and then pass the object to the `Parallel.For` method as an argument. – Theodor Zoulias May 23 '21 at 07:33
  • If you want to know what are the traps I'm talking about, you can look here: [Parallel.ForEach and async-await](https://stackoverflow.com/questions/23137393/parallel-foreach-and-async-await), or any of the [many other similar questions](https://stackoverflow.com/search?q=Parallel.ForEach+async). – Theodor Zoulias May 23 '21 at 07:52
  • `Parallel.XYZ` were designed for CPU bound operations. I assume `InsertProduct` performs a database operation which is I/O bound. If you want to perform multiple I/O bound operations concurrently then you should consider to use async I/O of the related database driver and issue the the async operations with `Task.WhenAll`. You might also need to consider to use throttling to do not flood the database if `body` contains a lots of product. – Peter Csala May 25 '21 at 08:30

2 Answers2

1

I created a small example maybe there you can find the best way which fits your case best.

dotnet fiddle Example

There are 3 options:

In sequence

As the title says every item is processed in sequence. Very save method but not the fastest one to process 9000 items :)

var list = GenerateItems();
var count = list.Count();
for(var i = 0; i < count; i++) 
{
    InsertInDatabaseAsync($"{i}", list.ElementAt(i)).GetAwaiter().GetResult();
}

With Parallel.For Library

Like said from the comments its good for CPU bound processing but has some lacks on async methods (here)

var list = GenerateItems();
var count = list.Count();
var options = new ParallelOptions{MaxDegreeOfParallelism = MAX_DEGREE_OF_PARALLELISM};
Parallel.For(0, count, options, (i) => 
{
    InsertInDatabaseAsync($"{i}", list.ElementAt(i)).GetAwaiter().GetResult();
});

With Async-Await

I think in your example this fits best for you. Every item is processed in parallel, starts the processing directly and spinns up a Task. (Copied the async-extension from here)

var list = GenerateItems();
var count = list.Count();

// Extensions method see in referenced SO answer
ForEachAsync(count, list, async (item, index) => 
{
    await InsertInDatabaseAsync($"{index}", item);
}).GetAwaiter().GetResult();

...Updated

Thanks for the comments. I have updated the async-await implementation to a more simpler one:

private static async Task ForEachAsync<T>(IEnumerable<T> enumerable, Func<T, int, Task> asyncFunc)
{
    var itemsCount = enumerable.Count();
    var tasks = new Task[itemsCount];
    int i = 0;
    foreach (var t in enumerable)
    {
        tasks[i] = asyncFunc(t, i);
        i++;
    }
    await Task.WhenAll(tasks);
}

And also added the MAX_DEGREE_OF_PARALLELISM set to 1. This has a huge impact on the parallel processing like described in the commends.

enter image description here

Martin
  • 3,096
  • 1
  • 26
  • 46
  • 1
    The `Parallel.For` measurement in your example shows the effects of a saturated `ThreadPool`. There are no `ThreadPool` threads available to execute the continuations of the `Task.Delay` tasks, so no progress can be made until new threads are algorithmically injected in the pool. Which happens every 500 msec or so. That's why you should [always](https://stackoverflow.com/questions/66261010/multiple-parallel-foreach-loops-in-net/66263583#66263583) specify the `MaxDegreeOfParallelism` when using the `Parallel.For` and `Parallel.ForEach` methods. – Theodor Zoulias May 25 '21 at 12:05
  • 1
    Also the `ForEachAsync` implementation in your example is buggy. You should initialize the `_database` before each measurement, and validate it after the end of the measurement, to ensure that all elements have been processed. – Theodor Zoulias May 25 '21 at 12:08
  • 1
    @TheodorZoulias Thanks for your comments I have updated the answer and of course the buggy async-await implementation (This was just a copy from another SO-question). – Martin May 25 '21 at 23:18
0

Do this.

string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
dynamic body = JsonConvert.DeserializeObject(requestBody);
Parallel.For(0, body.Count, i => {
    Product newProduct = InsertProduct(merchantId, websiteUrl,body[i]);
});