74

From epoll's man page:

epoll is a variant of poll(2) that can be used either as an edge-triggered
or a level-triggered interface

When would one use the edge triggered option? The man page gives an example that uses it, but I don't see why it is necessary in the example.

jww
  • 97,681
  • 90
  • 411
  • 885
Dan
  • 12,409
  • 3
  • 50
  • 87

3 Answers3

123

When an FD becomes read or write ready, you might not necessarily want to read (or write) all the data immediately.

Level-triggered epoll will keep nagging you as long as the FD remains ready, whereas edge-triggered won't bother you again until the next time you get an EAGAIN (so it's more complicated to code around, but can be more efficient depending on what you need to do).

Say you're writing from a resource to an FD. If you register your interest for that FD becoming write ready as level-triggered, you'll get constant notification that the FD is still ready for writing. If the resource isn't yet available, that's a waste of a wake-up, because you can't write any more anyway.

If you were to add it as edge-triggered instead, you'd get notification that the FD was write ready once, then when the other resource becomes ready you write as much as you can. Then if write(2) returns EAGAIN, you stop writing and wait for the next notification.

The same applies for reading, because you might not want to pull all the data into user-space before you're ready to do whatever you want to do with it (thus having to buffer it, etc etc). With edge-triggered epoll you get told when it's ready to read, and then can remember that and do the actual reading "as and when".

Pang
  • 9,564
  • 146
  • 81
  • 122
James M
  • 18,506
  • 3
  • 48
  • 56
  • 5
    Is this edge-triggered behavior safe against race conditions, e.g. if data becomes available after `read` fails with `EAGAIN` but before `epoll` is called? – R.. GitHub STOP HELPING ICE Feb 06 '12 at 16:45
  • 1
    Sure. `epoll` simply returns immediately if the FD is already ready and you haven't yet been notified. – James M Feb 06 '12 at 17:10
  • 32
    ET is also particularly nice with a multithreaded server on a multicore machine. You can run one thread per core and have all of them call epoll_wait on the same epfd. When data comes in on an fd, exactly one thread will be woken to handle it. – Chris Dodd Feb 06 '12 at 19:51
  • 1
    @ChrisDodd - Does this not work with level triggered epoll as well? Why not? – Alex Dec 07 '12 at 16:51
  • 5
    @windfinder Correct me if I'm wrong but in LT mode multiple threads _might_ be woken up on the same FD/SD in parallel, as long as data is there. With ET _one only_ notification is set for FD/SD when data is there, so only one thread would get such notification; other threads might get notification for the same FD/SD but only when the original thread would have read/written _all_ data for a notification; as you can imagine is _a lot_ easier to write MT epoll processes with ET. Hope this helps. – Emanuele Aug 11 '13 at 10:36
  • 10
    @Emanuele - Confirmed, ET guarantees that only one thread wakes up. – Alex Aug 12 '13 at 17:35
  • 1
    @Emanuele You're probably WRONG. `man 7 epoll`: Since even with edge-triggered `epoll`, multiple events can be generated upon receipt of multiple chunks of data, the caller has the option to specify the `EPOLLONESHOT` flag, to tell `epoll` to disable the associated file descriptor after the receipt of an event with `epoll_wait(2)`. When the `EPOLLONESHOT` flag is specified, it is the caller's responsibility to rearm the file descriptor using `epoll_ctl(2)` with `EPOLL_CTL_MOD`. – zeekvfu May 17 '14 at 05:24
  • I've found that rearming the fd with EPOLLONESHOT in ET mode, will cause epoll_wait() to return immediately with EPOLLIN if the read buffer wasn't read completely since last EPOLLIN. Not sure if this is intended behavior, but it could be used to prevent starvation if so. – pauluss86 Nov 04 '14 at 12:02
  • @JamesMcLaughlin Does the window IOCP have `edge-trigger` and `level-trigger` alike concepts ? – Adams.H Jul 16 '16 at 14:17
  • @zeekvfu "If multiple threads (or processes, if child processes have inherited the epoll file descriptor across fork(2)) are blocked in epoll_wait(2) waiting on the same the same epoll file descriptor and a file descriptor in the inter‐ est list that is marked for edge-triggered (EPOLLET) notification becomes ready, just one of the threads (or processes) is awoken from epoll_wait(2). This provides a useful optimization for avoiding "thundering herd" wake-ups in some scenarios." – JiaHao Xu May 23 '20 at 08:27
  • @zeekvfu According to the manpage I referenced above, if you are using the same epoll instance in threads/child process, then only one should wake up. – JiaHao Xu May 23 '20 at 08:36
  • @JiaHaoXu this paragraph is present in an online epoll man page, but is missing in Ubuntu 18.04.3 LTS `man 7 epoll` page. It seems to depend on the epoll (glib?) version. – AlexStepanov Jun 11 '20 at 20:46
10

In my experiments, ET doesn't guarantee that only one thread wakes up, although it often wakes up only one. The EPOLLONESHOT flag is for this purpose.

cpq
  • 157
  • 1
  • 4
  • 4
    `man 7 epoll`: Since even with edge-triggered `epoll`, multiple events can be generated upon receipt of multiple chunks of data, the caller has the option to specify the `EPOLLONESHOT` flag, to tell `epoll` to disable the associated file descriptor after the receipt of an event with `epoll_wait(2)`. When the `EPOLLONESHOT` flag is specified, it is the caller's responsibility to rearm the file descriptor using `epoll_ctl(2)` with `EPOLL_CTL_MOD`. – zeekvfu May 17 '14 at 05:27
  • 1
    Exactly, you get notified once per rising edge. If you add stdin to an epoll set as EPOLLET, each press of the enter key will generate an event. This is why EPOLLONESHOT is needed. – Guido Jun 11 '14 at 23:12
  • Did you have different epoll FDs or just one shared between threads? My understanding is that all epoll FDs should wake up but maybe only one thread for a shared FD. The new `EPOLLEXCLUSIVE` fixes the thundering herd problem for multiple epoll FDs. – Goswin von Brederlow Dec 11 '18 at 11:38
  • I mean multiple threads were waiting for a single FD. Sometimes several threads waked up if the EPOLLONESHOT flag was not set. While only one woke up if the flag was set. – cpq Dec 17 '18 at 14:41
2
  • Level triggered

    Use level trigger mode when you can't consume all the data in the FD and want epoll to keep triggering while data is available.

    For example, if you want to receive large files from FD, and you cannot consume all the file data from the FD at one time, and want to keep the triggering continue for the next consumption. The level trigger mode could be suitable for this case.

    • Disadvantage

      • thundering herd
        • The EPOLLEXCLUSIVE directive is meant to prevent the thundering heard phenomenon
      • less efficiency
        • When a read/write event occurs on the monitored file descriptor, epoll_wait() notifies the handler to read or write. If you don’t read or write all the data at once (e.g., the read/write buffer is too small), then the next time epoll_wait() is called, it will notify you to continue reading or writing on the file descriptor you didn’t finish reading or writing on, but of course, if you never read or write, it will keep notifying you.
        • If the system has a large number of ready file descriptors that you don’t need to read or write, and they return every time, this can greatly reduce the efficiency of the handler retrieving the ready file descriptors it cares about.
    • use cases

      • redis epoll Since the IO thread of Redis is single-threaded, level trigger mode is used.
  • Edge triggered

    Use edge triggered mode and make sure all data available is buffered and will be handled eventually.

    As Chris Dodd mentioned in the comments

    ET is also particularly nice with a multithreaded server on a multicore machine. You can run one thread per core and have all of them call epoll_wait on the same FD. When data comes in on an FD, exactly one thread will be woken to handle it

zangw
  • 43,869
  • 19
  • 177
  • 214