1

I have a chron job which calls a database table and gets about half a million records returned. I need to loop through all of that data, and send API post's to a third party API. In general, this works fine, but the processing time is forever (10 hours). I need a way to speed it up. I've been trying to use a list of Task with SemaphoreSlim, but running into issues (it doesn't like that my api call returns a Task). I'm wondering if anyone has a solution to this that won't destroy the VM's memory?

Current code looks something like:

foreach(var data in dataList)
{
  try 
  {
    var response = await _apiService.PostData(data);
    _logger.Trace(response.Message);
  } catch//
}

But I'm trying to do this and getting the syntax wrong:

var tasks = new List<Task<DataObj>>();
var throttler = new SemaphoreSlim(10);
foreach(var data in dataList)
{
  await throttler.WaitAsync();
  tasks.Add(Task.Run(async () => {
    try
    {
      var response = await _apiService.PostData(data);
        _logger.Trace(response.Message);
    }
    finally
    {
      throttler.Release();
    }
  }));
}
  • _"I have a chron job which calls a database table and gets about half a million records returned"_ - in a single call or paged? –  Feb 10 '22 at 01:51
  • It's a single call to a stored procedure that I don't control. – SlappingTheBass12 Feb 10 '22 at 01:52
  • 1
    You shouldn't use `Task.Run` with I/O-bound tasks as the thread will spend most of the time blocked waiting for the I/O call to complete. Just use the `Task` you are already returning from `_apiService.PostData` –  Feb 10 '22 at 01:53
  • _"It's a single call to a stored procedure that I don't control"_ - OK. Historically has there been any issues making this large call, apart from the fact that it takes a long time to complete? –  Feb 10 '22 at 01:54
  • can you post an example syntax? – SlappingTheBass12 Feb 10 '22 at 01:54
  • The data call isn't the issue, it's sending all that data to the third party in the loop which takes hours. – SlappingTheBass12 Feb 10 '22 at 01:55
  • Perhaps something like `tasks.Add(_apiService.PostData(data)); Task.WhenAll(tasks);`. Iterate through your `tasks` to get the results. Alternatively, take a took a _TPL DataFlow_ –  Feb 10 '22 at 01:59
  • _"The data call isn't the issue"_ - you didn't answer my question and I very much doubt that any **network connection** is going to like a call that takes **10 hours** let alone a database that wants to keep a client connection that long either –  Feb 10 '22 at 02:01
  • 1
    tasks.Add(_apiService.PostData(data)); gets error because the return type object of the apiService doesn't match the Task type – SlappingTheBass12 Feb 10 '22 at 02:03
  • 1
    OK. Historically has there been any issues making this large call, apart from the fact that it takes a long time to complete? --- Answer: the stored procedure is new and just set up, so we don't have much history. I call the stored proc via the chron job operating on a service on a VM, loop through the data, and send the data to a 3rd party API. I've run this for 3 days in a row and haven't encountered any errors or problems except for that the loop takes forever to make all those api calls. – SlappingTheBass12 Feb 10 '22 at 02:05
  • Thank-you for the update. This helps us help you. Sometimes a poster might be asking about something but through discussion we find something else that may be a) the _real_ problem b) or something that might be an _additional_ problem. Not saying that's necessarily the case here. :) –  Feb 10 '22 at 02:42
  • I you are targeting the .NET 6, you could look at the [`Parallel.ForEachAsync`](https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreachasync) method. If you are targeting an older platform, there are many custom alternatives available (for example [this](https://stackoverflow.com/questions/11564506/nesting-await-in-parallel-foreach/65251949#65251949)). – Theodor Zoulias Feb 10 '22 at 06:17

1 Answers1

1

Your list is of type Task<DataObj>, but your async lambda doesn't return anything, so its return type is Task. To fix the syntax, just return the value:

var response = await _apiService.PostData(data);
_logger.Trace(response.Message);
return response;

As others have noted in the comments, I also recommend not using Task.Run here. A local async method would work fine:

var tasks = new List<Task<DataObj>>();
var throttler = new SemaphoreSlim(10);
foreach(var data in dataList)
{
  tasks.Add(ThrottledPostData(data));
}
var results = await Task.WhenAll(tasks);

async Task<DataObj> ThrottledPostData(Data data)
{
  await throttler.WaitAsync();
  try
  {
    var response = await _apiService.PostData(data);
    _logger.Trace(response.Message);
    return response;
  }
  finally
  {
    throttler.Release();
  }
}
Stephen Cleary
  • 437,863
  • 77
  • 675
  • 810
  • I have a similar scenario, but can I control the returned results in this structure? According to the returned results, I need to update the database. – Cenk Apr 08 '23 at 13:47
  • @Cenk: it's not really clear what your scenario is. Could you post your own question? – Stephen Cleary Apr 08 '23 at 14:31
  • https://stackoverflow.com/questions/75445185/how-to-handle-parallel-foreachasync-completed-tasks-if-there-is-an-error – Cenk Apr 08 '23 at 14:40