0

I have the following function, which runs individually for thousands of files. When it runs, the UI thread locks up due to the synchronous I/O operation. But this source says that using async for many operations is inefficient, so how can I prevent the UI from locking up then?

public string CopyFile(string sourceFile, string fileName, bool forceCopy)
    {
        fileName = GetSafePathname(GetSafeFilename(fileName));
        string DestinationFile = Path.Combine(DestinationFolder, fileName);

        if (File.Exists(DestinationFile) && !forceCopy)
        {
           return DestinationFile = null;
        }

        else if (!File.Exists(DestinationFile)) //copy the file if it does not exist at the destination
        {
            File.Copy(sourceFile, DestinationFile);
            return DestinationFile;
        }

        else if (forceCopy) //if forceCopy, then delete the destination file and copy the new one in its place
        {
            File.Delete(DestinationFile);
            File.Copy(sourceFile, DestinationFile);
            return DestinationFile;
        }

        else { throw new GenericException(); }
    }
The Bic Pen
  • 773
  • 6
  • 21
  • This could be answered a couple ways, but yes run the copy on it's own thread outside of the thread that controls the UI. Likely you want one or a few at most threads that can work on a List and certainly not thousands of threads copying one file each – Austin T French Oct 16 '18 at 22:05
  • _”says that using async for many operations is inefficient”_ - that is not what it says at all –  Oct 16 '18 at 23:05
  • There will be a point at which too many concurrent/parallel file copies leads to diminishing returns, especially if all are on the same drive. By all means use a **Task** to prevent the UI locking up but consider only copying a file or two at a time by queueing them for processing by the task –  Oct 16 '18 at 23:28
  • Well that's how I interpreted this _There is no need to use a new synchronization context (async call) for each file. If you want to process the files in the background, it is better processing all files in one single Task than using one task for each file. Remember that each context switch produces a little overhead._ when I first read it @mickyd – The Bic Pen Oct 16 '18 at 23:29
  • _"...If you want to use an async API that way, consider using ConfigureAwait(true) to avoid context switches..."_ –  Oct 16 '18 at 23:31

1 Answers1

0

To figure out if multitasking is helpfull, you first need to understand where the bottleneck is. The bottleneck of a File System or File operation will be without a doubt the disk.

Now minimum Multtiasking is nessesary to keep the GUI responsive. Even if it is just one alterante thread or moving the loop into a asynch function.

To some degree Multitasking might even benefit the troughput: While one task is currently taxing the CPU for a bit of pre- or post-work, another might be writing.

But managing many operations also costs resources. And sooner or later the load of having to manage all those operations will consume all the gains. Paralell Slowdown sets in: https://en.wikipedia.org/wiki/Parallel_slowdown

Only a few operations are pleasingly/emberarsingly paralell. For them Multithreading slowdown sets in very late or even never: https://en.wikipedia.org/wiki/Embarrassingly_parallel Your case is without a doubt not one of those. Indeed you can asume Paralell Slowdown to set in very quickly.

Edit:

To give a math example, let us asume that each File opeartion spends 10ms CPU work, 200ms read/write work.

If you ran it sequentially in a single thread without multitasking over 200 files, that would be (200+10)*200 ms or 42 seconds (I swear I did not plan that).

If with async it is possible that one or several opeartions run their 10ms CPU work during writework of another operation. So for all but the last and first file, it cam be ignored. So suddenly it is: (200*200)+10 ms or 40.01 seconds. Almost 2 seconds saved.

Now starting to many basically increases the average CPU time for each of them. Mostly overhead in figuring out wich opeartion should get CPU time right now. By the moment that overhead reachs close to 1900 ms on average, you are all the way back to 42 seconds. And if you add more after that, the overhead will actually result in more time being spend on the CPU work then the actuall write work.

Christopher
  • 9,634
  • 2
  • 17
  • 31
  • Thanks for the advice. How can I put the function in a separate thread? It requires new arguments each time, so how can I use a thread for that? – The Bic Pen Oct 16 '18 at 22:50
  • Though good, OP isn't asking about parallel, rather about asynchronous operations. The two are not the same. https://stackoverflow.com/a/4844774/585968 –  Oct 16 '18 at 23:24
  • @Asynch, Threads, Threadpools, Tasks. All meerely ways to go about Multitasking/Paralell Opeartions. And Asynch is just a re-invention of Cooperative Multitasking, where every part has to yield. Just this time it is the compiler/runtime making certain there **will** be yielding, so it actually works :) – Christopher Oct 16 '18 at 23:29
  • @TheBicPen: If you got a CPU bound operation, Multithreading is a viable way to implement Multitasking. In all other cases it is way more hassale to use then it is worth. Disk access is as far from CPU bound operations as you can get. – Christopher Oct 16 '18 at 23:32
  • `async/await` when used with I/O completion ports can perform an I/O background operation with no threads at all. https://blog.stephencleary.com/2013/11/there-is-no-thread.html –  Oct 17 '18 at 01:02
  • @MickyD: That is exactly what I (tried) to say. The specific way you implement Multitasking does not mater for anything I said. Wich is why I talk about Multitasking, wich AFAIK is the top tier umbrella term. – Christopher Oct 17 '18 at 01:04
  • Ok. Your statement about about "disk access" made it appear that "multitasking" is only for CPU-bound operations which is incorrect –  Oct 17 '18 at 01:25