59

Wikipedia says

unlike the older system calls, which operate at O(n), epoll operates in O(1) [2]).

http://en.wikipedia.org/wiki/Epoll

However, the source code at fs/eventpoll.c on Linux-2.6.38, seems it is implemented with an RB tree for searching, which has O(logN)

/*
 * Search the file inside the eventpoll tree. The RB tree operations
 * are protected by the "mtx" mutex, and ep_find() must be called with
 * "mtx" held.
 */
static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd)
{

In fact, I couldn't see any man page saying the complexity of epoll() is O(1). Why is it known as O(1)?

ddoman
  • 1,051
  • 1
  • 10
  • 12
  • 1
    I would vote this up because you really did your homework, but I'm maxed out on daily votes :P –  Jun 24 '11 at 23:13
  • 7
    @ddoman: What Keoki Zee meant was that you did prior research on your own before asking this question. A lot of new (and old!) users of Stack Overflow can't even be bothered to do that. – In silico Jun 24 '11 at 23:15
  • Research, for doing research :) –  Jun 24 '11 at 23:15
  • Well, google seems to [agree](http://www.google.com/search?q=%22epoll+operates+in+O%281%29%22&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a#sclient=psy&hl=en&client=firefox-a&hs=RCb&rls=org.mozilla:en-US%3Aofficial&source=hp&q=%22epoll+operates+in+O%281%29%22&aq=f&aqi=&aql=&oq=&pbx=1&bav=on.2,or.r_gc.r_pw.&fp=27b93cf21cbb40b9&biw=1339&bih=795) – Caffeinated Jun 24 '11 at 23:17
  • There's no way `epoll` executes in `O(1)`, as the number of file descriptors it needs to watch isn't constant. There has to be some kind of repetition construct. – In silico Jun 24 '11 at 23:21
  • Would someone explain this `epoll()` function people keep talking about? I can't find any function with that name. `epoll` is a whole family of functions, not all of which have the same complexity. – Ben Voigt Jun 24 '11 at 23:24
  • @Insilico, you're making the assumption that `epoll()` must _scan_ the file descriptors to find ready ones. `epoll()` can also _wait_ on the descriptors to become ready... – sarnold Jun 24 '11 at 23:25
  • @sarnold: I made no assumption that `epoll` is scanning anything. What I meant was that if you're waiting on events that can happen at anytime then `O(1)` operation is not guaranteed by definition. Either that or everyone talking about the `epoll` API meant something else by `O(1)`. – In silico Jun 24 '11 at 23:40
  • @Insilico, hehe, good point. `O(asleep)`. :) – sarnold Jun 24 '11 at 23:42
  • 13
    It is somewhat inaccurate to say that `epoll` operates at O(1). `epoll` operates at O(log(n)) in respect to _adding and removing descriptors_, at O(n) in respect of _ready descriptors_, and at O(1) in respect of _monitored descriptors_ -- which is the whole point of `epoll`. Adding/removing descriptors is a rare thing, waiting for readiness is frequent, and you usually have _many more_ descriptors in your set than are actually ready. Insofar, while `epoll` does not really operate at O(1), it does so where it matters. The common case is fast, the uncommon case is not. – Damon Jun 25 '11 at 12:55
  • What does n in `O(n) for ready descriptors` signify? I thought it would be different from the input set. O(m) where m is the number of ready events. – piepi Aug 10 '23 at 05:37

2 Answers2

27

This makes sense once you look for ep_find. I only spent a few minutes with it and I see ep_find is only called by epoll_ctl.

So indeed, when you add the descriptors (EPOLL_CTL_ADD) that costly operation is performed. BUT when doing the real work (epoll_wait) it isn't. You only add the descriptors in the beginning.

In conclusion, it's not enough to ask the complexity of epoll, since there is no epoll system call. You want the individual complexities of epoll_ctl, epoll_wait etc.

Other stuff

There are other reasons to avoid select and use epoll. When using select, you don't know how many descriptors need attention. So you must keep track of the biggest and loop to it.

rc = select(...);
/* check rc */
for (s = 0; s <= maxfd; s++) {
    if (FD_ISSET(s)) {
        /* ... */
    }
}

Now with epoll it's a lot cleaner:

nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
/* check nfds */
for (n = 0; n < nfds; ++n) {
    /* events[n].data.fd needs attention */
}
Community
  • 1
  • 1
cnicutar
  • 178,505
  • 25
  • 365
  • 392
  • select returns the number of ready fd's too, in the worst case you need to loop through them all though (if `maxfd` has an event). – nos Jun 25 '11 at 10:05
  • @nos And how would you use that number to replace the loop ? – cnicutar Jun 25 '11 at 10:06
  • Somewhere you need to have a list of all your file descripors. I often use a linked list if I'm using select. `int handled = 0; while(client) { if(FD_ISSET(client->fd,read_set) {handle(client); handled++; } if(handled == rc) break; client = clients->next;}` , no need to check from `0 to rc`, only check the FDs you actually are monitoring, and no need to check more fds when you handled all the events select reported. – nos Jun 25 '11 at 10:10
  • 1
    Yes, you're right, you can *shorten* the loop. But, as you correctly pointed in your comment, you still have to process all your list (in the unfortunate case a descriptor near "the end" needs attention). And if that list is large and the descriptors are the first and the last.. – cnicutar Jun 25 '11 at 10:13
1

I think epoll wait is O(1) with epollet if you ask for 1 event. And upd and add could be amortized O(1) if they used a descent hashtbl implementation.

This needs checking and man pages should mention complexity!