-2

I have to proccess Gb of files and send them to the network using a library that has async methods to perform this kind of operations.

If I do the following, I think I will get an infinite thread number working at the same time if they take a while to complete:

void ProcessFiles()
{
    string[] files = /*a lot of files */

    foreach(file in files)
    {
        MyAsyncMethod(file)
    }
}

async void MyAsyncMethod(string file)
{
    string conent = File.ReadAllBytes(file);
    await MyLibrary.Async(call)
}

Of course it should be a limit in the number of concurrent threads running, but several threads will be created and performance will get worse... It should be a limit to create threads... what happens when this limit is reached? The new threads will be created after the current ones end? The will be ignored? The program will throw an exception??

How can I manage this "inifinite" number of async calls. Should I convert it to sync calls (Task.start() + Task.wait()) and manage it by my own thread pool?

Thank you

dhalfageme
  • 1,444
  • 4
  • 21
  • 42
  • 1
    You really should have run the code first to actually see what it does...and then you should go read a few tutorials on what `await` does, as you are misunderstanding what it does.. – Servy Oct 24 '16 at 15:47
  • 1
    The whole point of `Task`s and `async`/`await` is for you to **not** have to do thread management. You shoudn't need to worry. – EvilTak Oct 24 '16 at 15:50
  • 1
    You seem to have some misunderstanding of how async/await works. That should only handle one file at a time, there's no parallel reading of files as `await` causes the current loop to wait (without blocking) until `MyLibrary.Async` has finished before moving on to the next file. Similar to the `yield return` statement when writing an enumerable method. – Chris Chilvers Oct 24 '16 at 15:51
  • Your code, as it stands, is operating on the files sequentially. As soon as the "await" is hit, the loop stops and control leaves this method until MyLibrary.Async () returns. At that point, one more loop iteration happens, etc. – PMV Oct 24 '16 at 15:54
  • I've edited my code, the updated code what I'm trying to do, but I'm worried about the number of threads that could be created since I need to sent thousands of different files along some days – dhalfageme Oct 24 '16 at 15:56
  • I run a similar sample program I created with an asynchronous method, I see all the threads (some decens) reach the starting line of code of my async method, and all of them, after some time, reach the end of the method, but I can't see all the threads in the thread window in visual studio, so I'm missing something – dhalfageme Oct 24 '16 at 16:04
  • `yourarray.ToList().AsParallel().ForAll(file => { MyAsyncMethod(file) });` or simpler syntax `yourarray.ToList().AsParallel().ForAll(MyAsyncMethod);` though I'm not sure if this is efficient that's why I posted this as comment. – M.kazem Akhgary Oct 24 '16 at 16:12
  • I simplified the code to post the question. In fact, I don't have an array at the first of the for loop, I have to read and process the files before send them to the network. This process operation should take less time than the asyn call, but I'm not sure if I should put my processing code inside the async method – dhalfageme Oct 24 '16 at 16:16
  • See also e.g. https://stackoverflow.com/questions/32047064/how-to-throttle-multiple-asynchronous-tasks, https://stackoverflow.com/questions/34315589/queue-of-async-tasks-with-throttling-which-supports-muti-threading, https://stackoverflow.com/questions/17621026/semaphore-thread-throttling-with-async-await, or https://stackoverflow.com/questions/35023685/throttle-async-tasks, to name a few of the many other duplicate or similar questions. – Peter Duniho Oct 24 '16 at 16:52

3 Answers3

0

You don't have to worry about threads, .NET will worry about them for you.

By the way, the code you have provided is asynchronous, but runs sequentially:

foreach(string file in files)
{
    byte[] content = File.ReadAllBytes(file);
    await MyLibrary.AsyncCall(content); //<-- Loop is blocked per this await.
}

If you want to achieve parallelism, you should consider using Task.WhenAll, or Task.WaitAll:

In your specific case, it seems that MyLibrary.AsyncCall does not have a result, so you can use Task.WaitAll:

Task[] tasks = files.Select(File.ReadAllBytes).Select(c => MyLibrary.AsyncCall(c)).ToArray();
Task.WaitAll(tasks); //<-- Will execute concurrently, and get back when they are all done.

As for the question "does Task.WaitAll creates new threads for each Task?", the answer is no. (Does the user of async/await create a new thread?

Community
  • 1
  • 1
Matias Cicero
  • 25,439
  • 13
  • 82
  • 154
  • Why the downvote? – Matias Cicero Oct 24 '16 at 15:54
  • You've just written code to cause all of the problems that the OP is trying to avoid, when they weren't there before. – Servy Oct 24 '16 at 15:54
  • @Servy Please be more specific. What is the OP trying to avoid? – Matias Cicero Oct 24 '16 at 15:56
  • Feel free to read the question to find out, although in the future I'd suggest doing so *before* you post an answer, rather than *after*. – Servy Oct 24 '16 at 15:57
  • @Servy I have read the question. As far as I understand, the OP thinks he is sending the files concurrently but is concerned about the amount of parallel threads. – Matias Cicero Oct 24 '16 at 15:59
  • Yes, he's wondering how to avoid having all of the requests happen concurrently, as that would likely cause lots of problems. You've made the requests all happen concurrently, which is going to cause lots of problems. He thought that he had a problem that he didn't have, you've re-written the code to have the problems he was concerned about. – Servy Oct 24 '16 at 16:01
  • @Servy That's not what I understand. He doesn't want to avoid sending all the requests on parallel. He wants to avoid creating too many threads. I have posted the use of `Task.WaitAll` which has a proper thread management he does not need to worry about. – Matias Cicero Oct 24 '16 at 16:03
  • Hi Matias, if I run your code with hundred or thousands files at the same time, don't performance will be get worse?? I want to have something like a thread pool, say 10 tasks running at the same time and when one of that 10 task ends another one starts – dhalfageme Oct 24 '16 at 16:07
  • @Doctor Take a look at my related question at the end of the answer. Asynchronous code does not mean multithreading. `Task.WaitAll` will not always create a new thread. If it does, .NET will manage the lifetime of them. It will reuse and try to use as less as possible. This should not be something you need to worry about. – Matias Cicero Oct 24 '16 at 16:10
  • Thanks Matias, in that case I could use "batches of files" (since I could not put ALL my files and wait for all, there are thousand of files which I need to pre-process them before sending it to the newtwork) or something similar... – dhalfageme Oct 24 '16 at 16:21
  • @Doctor You can also add *on-demand handlers* that will process each task as it finishes (in case you don't want to wait for all to finish to start processing them). See [my question](http://stackoverflow.com/questions/39624386/is-there-a-callback-for-when-a-task-is-completed-in-task-whenall) about it. – Matias Cicero Oct 24 '16 at 16:24
  • In that case I would have two kinds of async method. The first one pre-processing the file contents and the second one sending the computed data using the library, right? Or do you mean processing the following batch or file when I'm notified of one of this async calls to the library ended? – dhalfageme Oct 24 '16 at 16:33
  • @Doctor With LINQ and `Select` you can add as many processing filters as you want. For instance, `.Select(c => PreProcess(c)).Select(async r => AnotherPreProcess(await r)).Select(async r => SendAsync(await r)).Select(async r => PostProcessing(await r))`, will execute two pre-processing functions before sending the web request, and then it will execute a post processing method. This will all be done on demand, as you get each response – Matias Cicero Oct 24 '16 at 16:36
  • I will take a look at this and I will let you know, thank you! – dhalfageme Oct 24 '16 at 16:38
0

The above code will use at most one thread more the one you're using. The "await" will ask the system for a task scheduling, then returns into the loop when the task has been completed. If the task will actually run on the same or another thread, is not something you can count on.

If you want to perform the calls in parallel, thus leveraging many threads (many CPU cores), you should use Parallel.For. Bear in mind that you shouldn't mix threads and async's without knowing exactly how to do.

Mario Vernari
  • 6,649
  • 1
  • 32
  • 44
0
  1. If the call actually does need a thread, the system will manage a thread pool for you, not spawn endless threads.

  2. In many cases, there is no thread. Asynchronous is not synonymous with multithreaded. Multithreading is a technique for asynchronously doing tasks but it is not the only way.

PMV
  • 2,058
  • 1
  • 10
  • 15
  • Ok, this means the sample code I posted will be valid?? If response is yes, how can I manage and wait until all MyAsyncMethod() calls did end if I'm not awaiting for them in the for loop. – dhalfageme Oct 24 '16 at 16:13