2

The problem is currently it runs pretty fine with limited number of reports (around 10 000), but on environments with much more reports it fails with Big amount of simultaneous requests for server causing heavy load on the server:

private void PerformSearch(SearchReportModel item)
{
    var tasks = new List<Task>();

    foreach (var term in item.SearchTerms)
    {
        var model = GetBaseQueryModel(item.Site);
        tasks.Add(Task.Factory.StartNew(() => CheckSearchTerm(model, term, item.Site, item.Language)));
    }
    try
    {
        Task.WaitAll(tasks.ToArray());
    }            
    catch (Exception ex)
    {
        Log4NetLogger.LogError(ex, ex.ToString());
        throw;
    }
}
  • What does your profiler say? – Dai May 18 '22 at 11:27
  • What does `GetBaseQueryModel` and `CheckSearchTerm` do? – Dai May 18 '22 at 11:27
  • Why aren't you using `async`/`await` with `Task.WhenAll`? – Dai May 18 '22 at 11:27
  • a.) Simply check something you consider an optimization b.) If you want throttling the mentioned Scheduler won't help. It may have helped if you want to lock the search when someone else is updating the data to be search and you want no search to occur then. So when you want to use the scheduler as some kind of reader/writer lock. But with just concurrent readers we get to see here it does not help. – Ralf May 18 '22 at 11:31
  • @Dai the problem is that methods can't be "partitioned" in terms of making it async in order to be used concurently. Or I might be mistaken... Profiler on my side working fine, but there is a little issue that we have double verifiying requests for each item and therefore heavy load happens on prod – red guardgen May 18 '22 at 11:33
  • @Ralf I see - tried some samples but all of them failed due to thread safety problems. What you may suggest in terms of artificially limitting tasks being generated? – red guardgen May 18 '22 at 11:35
  • 1
    For just limiting the concurrent request using a Semaphore like in one answer is already a solution. There is a plethora of locking mechanisms in .Net. But limiting means that presumably some requests can't be handled just like now presumably. How to solve telling that to a caller "you need to try later" as early as possible before running into timeouts and using server resource for that also maybe harder. – Ralf May 18 '22 at 11:45

1 Answers1

2

Assuming that CheckSearchTerm performs API calls in question I would suggest just using SemaphoreSlim to throttle the load:

Represents a lightweight alternative to Semaphore that limits the number of threads that can access a resource or pool of resources concurrently.

var tasks = new List<Task>();
var maximumRequests = 100; // maximum simultaneous invocations 
var limiter = new SemaphoreSlim(maximumRequests); // possibly make a global one via static variable or move inside of CheckSearchTerm
foreach (var term in item.SearchTerms)
{
    var model = GetBaseQueryModel(item.Site);
    tasks.Add(Task.Run(async () =>
    {
        await limiter.WaitAsync();
        try
        {
            CheckSearchTerm(model, term, item.Site, item.Language);
        }
        finally
        {
            limiter.Release();
        }
        
    }));
}

Another option is using Parallel.ForEach/PLINQ with ParallelOptions.MaxDegreeOfParallelism/WithDegreeOfParallelism set to required value.

Guru Stron
  • 102,774
  • 10
  • 95
  • 132
  • I've tried this one, is there any way to make Semaphore applicable not only for asynchronous methods? Cause currently it works just too fast, but all the records are corrupt – red guardgen May 19 '22 at 05:18
  • 1
    @redguardgen sorry, forgot that `Task.Factory.StartNew` is not async aware. Since you are not passing any options use `Task.Run` instead (see [this](https://stackoverflow.com/a/50912971/2501279) also). Please see the updated code. – Guru Stron May 19 '22 at 12:44
  • 1
    The `Parallel.ForEach` is preferable IMHO. It has the advantage that in case of an error the parallel operation will complete promptly. On the contrary the `Task.WaitAll(tasks.ToArray())` will wait invariably for all the tasks to complete. Just make sure to configure the `Parallel.ForEach` with an appropriate `MaxDegreeOfParallelism`, and don't rely on the default configuration (which is -1, meaning unbounded). – Theodor Zoulias May 19 '22 at 13:03