-1

I am sending a list of about 100,000 JSON objects to an API that can only accept them one by one and I am sending them asynchronously. I know that internally the API sends the received object to a queue which seems to be chocking up by all of these requests which results of me getting a "Gateway Timeout" error after quite a few of them.

I tried breaking up the list in batches of different sizes and putting the thread to sleep after each batch is sent but what ends up happening is that it fails with the same error at about the batch size, I've tried it with batches of 3000, 2500 and 1000 with the same result and the Thread never seems to go to sleep.

Here's the code in question:

public async Task TransferData(IEnumerable<MyData> data)  
{  
     var pages = Math.Ceil(data.Count() / 3000m);  

     for (var page = 0; page < pages; page++)  
     {  
         await TransferPage(data.Skip(page * 3000).Take(3000);
         Thread.Sleep(10000);  
     }  
}

private async Task TransferPage(IEnumerable<MyData> data)  
{  
     await Task.WhenAll(data.Select(p => webConnection.PostDataAsync(JsonConvert.SerializeObject(p, Formatting.None))));  
}

Note: webConnection is just a class that has a HttpClient already instantiated and does a PostAsync for the data to the intended URL.

The call to TransferData is done in a Console Application like so:

try  
{  
   ...    
   dataManager.TransferData(data).Wait();
}
catch(AggregateException ex)
{
   ...
}
catch(Exception ex)
{
   ...
}

Thank you for any guidance.

UPDATE: To clarify some of the confusion that arose in the comments. The external API is receiving the objects one by one, if you take a look at private method TransferPage inside of the WhenAll the IEnumerable has a Select with the call to the method that internally does the actual HttpClient PostAsync one. So the objects ARE being grouped in batches and within each batch they are sent one by one. I hope this makes it a little bit more clear.

Sergio Romero
  • 6,477
  • 11
  • 41
  • 71
  • 7
    If it can only handle them in sequence, why are you sending them in parallel? – SLaks Apr 17 '19 at 15:36
  • 1
    you need to await `Task.Delay`. Do not mix Thread.Sleep with async-await. You also do not mix async-await with blocking calls like `.Wait()` as that can lead to deadlocks – Nkosi Apr 17 '19 at 15:38
  • As @slaks said code and text in the post do not align - there is significant difference between “one by one” and batches of 3000 as code shows. – Alexei Levenkov Apr 17 '19 at 15:45
  • So you must use await Task.Delay() also you have missing ')" before thread sleep. But the problem is that you try to send a lot of data simultaneously not one by one. – Azzy Elvul Apr 17 '19 at 15:46
  • @nkosi OP said it is console app - sleep will have the same behavior as delay, as well as Wait will not cause deadlock (asynchronous main is cleaner, but not much different) – Alexei Levenkov Apr 17 '19 at 15:48
  • @AlexeiLevenkov noted. missed that part – Nkosi Apr 17 '19 at 15:49
  • 1
    The case for `SemaphoreSlim`. See [link](https://stackoverflow.com/questions/22492383/throttling-asynchronous-tasks) . – Serg Apr 17 '19 at 15:52
  • Even though your volume of API calls is causing the failure, in a sense the problem is still on the other end and out of your control. What happens if you finally find the right level of throttling, it works, and then someone else overwhelms the API with requests while you're making yours? Yours could still fail, or you could cause theirs to fail. It seems like there's a mismatch between how you're using the API and its intended use. If you contacted the provider of the API would they help you or tell you to stop making so many calls? – Scott Hannen Apr 17 '19 at 19:30

1 Answers1

0

What's likely happening is that one or more of the PostDataAsync tasks is throwing the timeout error, resulting in a failed task. Task.WhenAll only bundles these up into an AggregateException and throws it once all the tasks in the list are completed, which is why you only see an exception at the end of a batch.

You are likely overwhelming the service, despite your attempt to throttle. You should probably do a couple things:

  • Improve the exception handling and retry situation. You could do this inside PostDataAsync and/or outside of it. Even if you aren't overwhelming the service, you are going to need to handle transient exceptions anyway to deal with network hiccups and the like.
  • Replace your batching logic with a proper throttling implementation. The answers to the question that Serg linked in the comments are a good start - SemaphoreSlim or TPL Dataflow are common solutions.
nlawalker
  • 6,364
  • 6
  • 29
  • 46