2

I'm writing an application which interact with Azure Cosmos DB. I need commit 30,000 Records to CosmosDB in a Session. Because I used .NET Core so I cannot use BulkInsert dll. So, I use Foreach loop to Insert to CosmosDB. But I see too many request per Second and it overload RU limit by CosmosDB.

foreach(item in listNeedInsert){
      await RequestInsertToCosmosDB(item);
}

I want to Pause foreach loop when Number of request reach 100. After done 100 request. foreach will continue.

Liam
  • 27,717
  • 28
  • 128
  • 190
TuanDPH
  • 461
  • 5
  • 14
  • please see https://stackoverflow.com/questions/23921210/grouping-lists-into-groups-of-x-items-per-group – Yehor Androsov May 15 '19 at 14:02
  • 1
    `await` awaits already executing tasks. It doesn't start them or control their execution. – Panagiotis Kanavos May 15 '19 at 14:02
  • 2
    If you want to limit the number of inserts to N per second, add a delay in the loop that ensure that no more than N calls can be made per second with `await Task.Delay(...);`. – Panagiotis Kanavos May 15 '19 at 14:03
  • Can you know programatically if the inserts are done? if you do then, what you should do, is come up with a counter, and then when it reaches 100 requests, await for the inserts to finish and then continue – nalnpir May 15 '19 at 14:11
  • Can you define a constraint - either I must only start *n* requests per second, and/or only *n* requests may be executing at a given time? Whatever the constraint is that's causing you do to this, I'd code this to fit that constraint specifically. If you aim for an arbitrary number like 100, what happens if they still execute too fast? Will it be predictable and consistent? If the need is to limit requests per second, do that specifically. Don't do something else and hope that it happens to work. – Scott Hannen May 15 '19 at 14:28
  • Another way of phrasing this - you're asking two different questions. The title says that you want to limit requests per second, but then you're saying that you want to pause after every 100. That will mean *fewer* requests per second, but that's not the same as *limiting* requests per second. Perhaps what you want is this: https://stackoverflow.com/questions/10806951/how-to-limit-the-amount-of-concurrent-async-i-o-operations – Scott Hannen May 15 '19 at 14:34
  • The .NET SDK for ComsosDB will automatically back off and retry when it uses too much RU. Is there some reason this behavior isn't working for you? – Stephen Cleary May 15 '19 at 14:39
  • On top of what Stephen said, manual throttling will only work if you only have one instance of your app running in which case, I don't know if Cosmos DB is the tech for you since it's selling point is it's scale ability and global distribution – Nick Chapsas May 15 '19 at 15:19
  • I found solution for my problem. Thanks all of you. – TuanDPH May 16 '19 at 03:15

4 Answers4

4

You can partition the list and await the results:

var tasks = new List<Task>();

foreach(item in listNeedInsert)
{
    var task = RequestInsertToCosmosDB(item);
    tasks.Add(task);

    if(tasks.Count == 100)
    {
        await Task.WhenAll(tasks);
        tasks.Clear();
    }
}

// Wait for anything left to finish
await Task.WhenAll(tasks);

Every time you've got 100 tasks running the code will wait for them all to finish before executing the last batch.

Sean
  • 60,939
  • 11
  • 97
  • 136
  • 100 concurrent requests are way too many for the database to handle efficiently. I'd try something in the single digit range. Try experimenting with different numbers to see what works best. If you want even more efficiency, use SemaphorSlim and its built in throttling mechanism. – Slothario May 22 '19 at 15:02
0

You could set a delay on every hundred iteration

int i = 1;
foreach(item in listNeedInsert)
{
      await RequestInsertToCosmosDB(item);
      if (i % 100 == 0)
      {
          i = 0;
          await Task.Delay(100); // Miliseconds
      }
      i++;
}
0

If you really want to maximize efficiency and can't do bulk updates, look into using SemaphorSlim in this post:

Throttling asynchronous tasks

Hammering a medium-sized database with 100 concurrent requests at a time isn't a great idea because it's not equipped to handle that kind of throughput. You could try playing with a different throttling number and seeing what's optimal, but I would guess it's in the single digit range.

If you want to do something quick and dirty, you could probably use Sean's solution. But I'd set the Task count to 5 starting out, not 100.

Slothario
  • 2,830
  • 3
  • 31
  • 47
0

https://github.com/thomhurst/EnumerableAsyncProcessor

I've written a library to help with this sort of logic.

Usage would be:

await AsyncProcessorBuilder.WithItems(listNeedInsert) // Or Extension Method: listNeedInsert.ToAsyncProcessorBuilder()
        .ForEachAsync(item => RequestInsertToCosmosDB(item), CancellationToken.None)
        .ProcessInBatches(batchSize: 100);
TomH
  • 2,581
  • 1
  • 15
  • 30