1

I need to write a GUI application to process some bunches of files on the external command-line tools. And I need to parallel them by the file to and throttle them on the threads of the CPU to maximize the cpu-usage and the throughput. I did some works and some research on it:

Parallel.ForEach

When I asked this question first time on the StackOverflow, someone advised me to use the Parallel.Foreach. It does work; but it just blocks some of threads and wastes CPU for waiting external processes. And if the external process runs on a long time, it would reduce the threads on parallelism! So finally I gave up on using this and tried to find other solutions.

Semaphoreslim

I simply use

SemaphoreSlim sem = new SemaphoreSlim(Environment.ProcessorCount);

to throttle the number of the external process and just use

await task.whenall(tasks);

for waiting all of the process without blocking my GUI program.

Now I am using this. It works very well.

But just one problem: it is mentioned in the MSDN that semaphoreslim is designed for a single process when wait times are expected to be very short. But in my external process, it often runs very long (the process time is based on the input file's type and size). So the Spinwait wastes the CPU resource in my case. So I am really wondering if there are some solutions to avoid this spinwait but until now I can't find one. Some may say to use traditional semaphore. I have tried. But semaphore can not be Awaitable, so it blocked my GUI and if I use

await Task.run()

with it, then it doesn't perform better than semaphoreslim.

TPL Dataflow

The other solution I found is to use the TPL dataflow library. It does perform better than semaphoreslim slightly. But some of my specific use-case can not be implemented in the TPL Dataflow.

For example, I have a bunch of archives. I need to decompress them and process the files inside each archive and then re-compress. In TPL Dataflow, I thought to split to the "decompress block" (palarism:1), "file process block" (palarism:12), and "compress block" (palarism:1). But I don't know how to wait some of the tasks in the all task in TPL Dataflow. If my understanding is not wrong, TPL Dataflow can just wait until a block is finished or not. In my case, if the files of archive one is processed, the compress block has no way to know it. It needs to wait until all files have been processed.

But in semaphoreslim, I can use

await Task.whenall(someoftasks); 

in each foreach iteration of the archives to await them. So I can get higher throughput, so I finally gave up on using TPL dataflow.

Conclusion

So after my research, I am still using semaphoreslim. It works very well, but I am confused of its spinwait for wasting CPU resource. So I am wondering if there is any better approach for throttling on external process in c#.

Richardissimo
  • 5,596
  • 2
  • 18
  • 36
Syun
  • 21
  • 3
  • There's very little point in worrying about the overhead of your waiting mechanism, since the cost of creating and destroying processes dwarfs all of that. Busy waiting would still be something to worry about, but `SemaphoreSlim` only uses spinwaits for very short intervals before switching to a "proper" wait. The confusion stems from thinking `SemaphoreSlim` is *only* appropriate for short waits -- that's not true. It's just that when you have a short wait, it's less expensive than `Semaphore`. When you have a long wait, the difference is negligible. – Jeroen Mostert Nov 19 '18 at 13:25
  • You should have a look at [this implementation](https://stackoverflow.com/a/11565317/542251) using `ActionBlock`, I think it does what you want – Liam Nov 19 '18 at 13:27
  • Confusingly stated, it is not what they meant. Note that the article compares Semaphore against SemaphoreSlim. "very short" wait times gives SemaphoreSlim an edge over Semaphore. If they are not short then it is pretty likely that the semaphore needs to block an then you can't see the difference anymore. – Hans Passant Nov 19 '18 at 18:44

0 Answers0