11

So I've been looking into overlapped IO for sockets for a server application I'm building, and I keep seeing comments of people saying "never use hEvent" or "IO completion ports will be faster", etc, but no one ever says WHY not to use hEvent and no one ever provides any real-world data or numbers on completion ports being faster, or how much faster. hEvent with WaitForMultipleObjects() fits better into my application, so if the speed difference is marginal I'm inclined to use that, but I don't want to commit to that without some real data telling me how big of a sacrifice I'm making there. I've googled and googled and googled and can't find any benchmarks or articles or ANYTHING comparing the two strategies aside from a few StackOverflow answers saying "don't use this one" without giving a reason.

Can anyone provide me with some real information or numbers here on the practical, real world difference between using hEvent and completion ports?

ShadauxCat
  • 211
  • 2
  • 9
  • It's about reducing the number of threads. If you use a single `WaitForMultipleObjects` to wait for all of your I/O, then that's basically the same as an I/O completion port. What you don't want is for every I/O operation to come with its own `WaitForSingleObject`, because that means that each pending I/O operation requires its own thread, which [doesn't scale](https://en.wikipedia.org/wiki/C10k_problem). – Raymond Chen Jun 20 '17 at 23:52
  • Right, I can definitely understand how `WaitForSingleObject` absolutely would not scale. I'm imagining `WaitForMultipleObjects` taking a similar role to `epoll` in this case, although I do realize there are big differences. I can also understand how completion ports could have a slight edge over `WaitForMultipleObjects`, but if the edge is small, I'd prefer the `WaitForMultipleObjects` approach on merits of having better (perceived) control of the threads in my application. – ShadauxCat Jun 21 '17 at 00:10
  • 5
    Keep in mind that using WaitForMultipleObjects limits you to 64 clients per thread. How many simultaneous connections are you expecting? – Harry Johnston Jun 21 '17 at 00:30
  • The people who say "never use hEvent" need to give a reason. Otherwise the advice is useless. – Raymond Chen Jun 21 '17 at 00:48
  • 2
    @HarryJohnston Really? That's what `MAXIMUM_WAIT_OBJECTS` is? That is a very good reason not to use it - I hadn't actually started writing the code yet, just looked at documentation, and had figured `MAXIMUM_WAIT_OBJECTS` would be... something reasonable. Thank you for finally giving a good reason! (And I just made it obvious I'm not primarily a windows developer. :) ) – ShadauxCat Jun 21 '17 at 01:40

3 Answers3

10

This answer originates from Harry Johnston as a comment on the question, and with a bit of searching I found some more details that make WaitForMultipleObjects a terrifying thing.

The maximum number of objects you can wait for is 64. That alone makes scalability of the WFMO approach pretty much non-existent. But looking further, I found this thread: https://groups.google.com/forum/#!topic/comp.os.ms-windows.programmer.win32/okwnsYetF6g

In NT terms, to enter the wait, a wait block has to be allocated for every object, and each waitblock is queued to the object you're waiting for and then cross-linked to the thread. When any of those objects are signalled all those wait blocks have to be dequeued, unlinked, and deallocated back to pool. All of that happens at DISPATCH_LEVEL and all except the pool allocation and free happens with the dispatcher spinlock held.

(WFMO with fAll == TRUE is even MORE expensive. Every time ANY of the objects is signalled, all the others have to be checked. This all happens, you guessed it, at DISPATCH_LEVEL with the dispatcher spinlock held.)

That spinlock at the dispatcher level prevents preemption and timeslicing of threads across the whole system, even with multiple cores. That's terrifying and a good reason to never use WFMO for anything ever if you're waiting for more than 3 objects (the thread has 3 wait blocks pre-allocated and can avoid a lot of that if you're waiting for 3 or fewer).

ShadauxCat
  • 211
  • 2
  • 9
4

For maximum performance, you should use IO completion ports. There is no limit to the number of sockets. All other select-like api will only service 1024 sockets, and performance will drop rapidly, along with higher than needed cpu usage.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa365198(v=vs.85).aspx

You can also check out this great presentation on asynchrounous i/o, which I think is a must watch for anyone contemplating writing medium to large scale client server apps.

History of Time: Asynchronous C++ - Steven Simpson [ACCU 2017] https://www.youtube.com/watch?v=Z8tbjyZFAVQ

In this presentation you will find a complete description and comparison of the available technologies, along with benchmark results. Well worth the time.

WaitForMultipleObjects() limit to 64 handles makes it unpractical for handling anything involving more than a handful of i/o streams.

Michaël Roy
  • 6,338
  • 1
  • 15
  • 19
  • Sure, select-like APIs have issues, but I didn't ask about them. :) I'm already aware of the problems with select(), poll(), etc. This question was specifically about two ways of interacting with overlapped IO (both of which are asynchronous, non-blocking, and do not use select-like APIs): IO Completion Ports vs. setting the `hEvent` object on the `OVERLAPPED` struct. I appreciate the information you provided -- but you didn't actually talk about `WaitForMultipleObjects` here at all... – ShadauxCat Jun 21 '17 at 04:13
  • WaitForMultipleObjects Is not a viable way to do I/O. It's worse than select, – Michaël Roy Jun 21 '17 at 16:55
  • I've seen other sites say otherwise, but no one ever gives any actual reasons or any actual data. I've seen some sites say "Overlapped IO is the best (except for completion ports, which are even better)". And then I've seen others say "Don't use `WFMO`". If you read the content of my post, my question was not "which should I use", it was "can someone give me some real numbers and reasons why one is better than the other?" As Raymond Chen said, saying 'don't use this' without giving reasons is not particularly helpful advice. (That said, if you'll look at my own answer, I found the reasons.) – ShadauxCat Jun 21 '17 at 17:24
  • 2
    In other words, this question wasn't asking for advice, but for understanding. I know which one people suggest to use, but I wanted to know why, and I wanted to know how big the difference was. Comparisons between `IOCP` and `select()` are abundant, but there are NO comparisons between `IOCP` and the other methods of working with overlapped IO. I wanted the actual data to be able to make my own informed conclusions rather than assuming the internet is correct. – ShadauxCat Jun 21 '17 at 17:26
  • The video linked to above does exactly that. In detail and in a c++ context. – Michaël Roy Jun 21 '17 at 17:55
  • And WaitForMultipleObjects() limit to 64 handles makes it unpractical for handling anything involving more than a handful of i/o streams. – Michaël Roy Jun 21 '17 at 18:00
  • The video discusses `select()`, threads, and `epoll` (and mentions IOCP as similar to epoll). I'm already very familiar with all the strategies the video discussed. The video did not mention the two strategies I asked about. The 64-handle limit is what I needed to know about (the msdn documentation doesn't explicitly mention this number and I'm not a windows programmer, I'm porting a linux implementation using epoll), and if you'd read the answer I posted, you'd see that is exactly the conclusion I came to once I learned of that limit. – ShadauxCat Jun 21 '17 at 18:24
  • 1
    It doesn't discuss WaitForMultipleObjects _because_ it's not even considered as a viable solution. As everyone it telling you. The performance of epoll and io completion ports are comparable, since these two APIs access the same underlying technology, albeit for different OSes. – Michaël Roy Jun 21 '17 at 18:30
  • That's fine, I understand that and I'm not arguing with it. I just wanted more than someone's word for it, I wanted to understand why. Is that really such a bad question to ask? Knowing why equips me to make correct decisions for myself without having to ask the next time, and I now have good information that applies to other situations that would use WFMO as well. – ShadauxCat Jun 21 '17 at 18:59
0

There's actually a third way to notify completion of overlapped I/O, and that's using an overlapped completion routine which gets queued to the initiating thread's APC queue.

All of these methods are useful enough.

The biggest drawback with event-based notification is that it doesn't scale (you can only wait on 64 events at a time). Win32 events aren't exactly optimal ways to get a notification in the first place. This is suitable for multithreaded applications that do very little I/O but don't want to wait for completion on the initiating thread or want to do some limited multiplexing.

The main drawback for overlapped completion routines is that you can't control which thread gets the notification, and you need to put the initiating thread into an alertable wait state to run the completion routine. This is suitable for traditional single-threaded UI applications (use MWMOEx to wait for messages and put the main thread into an alertable wait state simultaneously), but not for a modern high quality video game or internet service that might initiate I/O from any of multiple threads and does a lot of I/O.

IOCP covers the case where you have many threads that may initiate I/O, do a lot of I/O, and/or want to get notified of completion on some arbitrary thread (or, in essence, any possible thread). The only application type it's ill-suited to relative to the other options is a single threaded UI application.

I haven't compared the three methods in terms of performance, I simply see them as adapted to different patterns. I never use the event notifications myself, but have used both completion routines and IOCP in various projects, although I haven't actually used the completion routines in like a decade if I'm being honest.

nfries88
  • 374
  • 6
  • 7