199

I am referring to the POSIX standard select and poll system C API calls.

Steven Lu
  • 41,389
  • 58
  • 210
  • 364

3 Answers3

258

The select() call has you create three bitmasks to mark which sockets and file descriptors you want to watch for reading, writing, and errors, and then the operating system marks which ones in fact have had some kind of activity; poll() has you create a list of descriptor IDs, and the operating system marks each of them with the kind of event that occurred.

The select() method is rather clunky and inefficient.

  1. There are typically more than a thousand potential file descriptors available to a process. If a long-running process has only a few descriptors open, but at least one of them has been assigned a high number, then the bitmask passed to select() has to be large enough to accomodate that highest descriptor — so whole ranges of hundreds of bits will be unset that the operating system has to loop across on every select() call just to discover that they are unset.

  2. Once select() returns, the caller has to loop over all three bitmasks to determine what events took place. In very many typical applications only one or two file descriptors will get new traffic at any given moment, yet all three bitmasks must be read all the way to the end to discover which descriptors those are.

  3. Because the operating system signals you about activity by rewriting the bitmasks, they are ruined and are no longer marked with the list of file descriptors you want to listen to. You either have to rebuild the whole bitmask from some other list that you keep in memory, or you have to keep a duplicate copy of each bitmask and memcpy() the block of data over on top of the ruined bitmasks after each select() call.

So the poll() approach works much better because you can keep re-using the same data structure.

In fact, poll() has inspired yet another mechanism in modern Linux kernels: epoll() which improves even more upon the mechanism to allow yet another leap in scalability, as today's servers often want to handle tens of thousands of connections at once. This is a good introduction to the effort:

http://scotdoyle.com/python-epoll-howto.html

While this link has some nice graphs showing the benefits of epoll() (you will note that select() is by this point considered so inefficient and old-fashioned that it does not even get a line on these graphs!):

http://lse.sourceforge.net/epoll/index.html


Update: Here is another Stack Overflow question, whose answer gives even more detail about the differences:

Caveats of select/poll vs. epoll reactors in Twisted

Community
  • 1
  • 1
Brandon Rhodes
  • 83,755
  • 16
  • 106
  • 147
  • 1
    And +1 for linking to an example of using epoll in python - looks like there are some interesting examples there, and I'm going to have to try them out... – Allen George Jun 16 '11 at 18:44
  • This answer makes it sound like epoll is always preferable – user3467349 Jan 23 '15 at 17:16
  • 1
    You're making it seem as if the bitscan was the bottleneck. It isn't. Scanning a 1024 bit array (1024 is the FD_SETSIZE on Linux) takes about 20ns. Syscalls takes hundreds if not thousands. Sparse gaps can be skipped rather efficiently (much more efficiently than with `struct pollfd` arrays where you need gaps too or the maintanance of the nongapness is going to be much more expensive than fd_set manipulation). – Petr Skocik Jun 15 '21 at 11:33
  • According to this benchmark I found, they seem about equally as (in)efficient with large fd sets: https://monkey.org/~provos/libevent/libevent-benchmark2.jpg, with poll getting only a small edge at high fd counts, probably because select modifies its argument sets more than select does. – Petr Skocik Jun 15 '21 at 11:33
114

I think that this answers your question:

From Richard Stevens (rstevens@noao.edu):

The basic difference is that select()'s fd_set is a bit mask and therefore has some fixed size. It would be possible for the kernel to not limit this size when the kernel is compiled, allowing the application to define FD_SETSIZE to whatever it wants (as the comments in the system header imply today) but it takes more work. 4.4BSD's kernel and the Solaris library function both have this limit. But I see that BSD/OS 2.1 has now been coded to avoid this limit, so it's doable, just a small matter of programming. :-) Someone should file a Solaris bug report on this, and see if it ever gets fixed.

With poll(), however, the user must allocate an array of pollfd structures, and pass the number of entries in this array, so there's no fundamental limit. As Casper notes, fewer systems have poll() than select, so the latter is more portable. Also, with original implementations (SVR3) you could not set the descriptor to -1 to tell the kernel to ignore an entry in the pollfd structure, which made it hard to remove entries from the array; SVR4 gets around this. Personally, I always use select() and rarely poll(), because I port my code to BSD environments too. Someone could write an implementation of poll() that uses select(), for these environments, but I've never seen one. Both select() and poll() are being standardized by POSIX 1003.1g.

October 2017 Update:

The email referenced above is at least as old as 2001; the poll() command is now (2017) supported across all modern operating systems - including BSD. In fact, some people believe that select() should be deprecated. Opinions aside, portability issues around poll() are no longer a concern on modern systems. Furthermore, epoll() has since been developed (you can read the man page), and continues to rise in popularity.

For modern development you probably don't want to use select(), although there's nothing explicitly wrong with it. poll(), and it's more modern evolution epoll(), provide the same features (and more) as select() without suffering from the limitations therein.

cegfault
  • 6,442
  • 3
  • 27
  • 49
akappa
  • 10,220
  • 3
  • 39
  • 56
  • 17
    When was Stevens answer written? Does the comment about poll() not being available on BSD still apply? MacOS X (which is partly based on BSD) has poll(), and the POSIX standard (POSIX 2008) requires it. – Jonathan Leffler Mar 10 '10 at 20:49
  • 15
    Rich Stevens passed away in September 1999, so the answer has to be older than that. He mentions seeing a new change in BSD/OS 2.1, which was released in January 1996, so probably around then. – alanc Jun 05 '10 at 15:13
  • 2
    I don't believe it. Answer posted 5 years ago, I stumble upon it, leave it open in the browser. The very next day, the author edits and improves the answer. SO notifies me with update on page using AJAX/websocket. this is why SO is great – Steven Lu May 14 '14 at 16:21
  • 12
    @StevenLu Yes, but unfortunately no word on whether AJAX/websocket is using `select` or `poll` :( – Christopher Schultz Jul 05 '16 at 22:07
  • > Someone could write an implementation of poll() that uses select(), for these environments, but I've never seen one. Java does so ;-) – Sergey Mashkov Apr 18 '17 at 13:05
  • Regarding October 2017 update: note about `poll` portability is good, but note about `epoll` is, IMHO, not good. `epoll` is not portable at all - it's Linux specific, and this question is tagged broadly as `unix`. One should at least note portability aspect of `epoll`, and list alternatives (`kqueue` on BSD, `/dev/poll` on some proprietary unix flavors). – el.pescado - нет войне Oct 18 '17 at 07:22
  • @el.pescado yes, epoll has portability issues (ironically similiar to poll's portability issues 20 years ago). However, this question is still a top result on google for people searching for poll/select, and many of the people coming from search engines don't know about epoll. That is, although the OP was more generic (unix), many of the visitors to this question will benefit from knowing that epoll exists. It updates, not revises, the answer. IMO someone should probably add another answer to describe modern techniques like kqueue and /dev/poll – cegfault Oct 18 '17 at 20:52
3

Both of them are slow and mostly the same, But different in size and some kind of features!

When you write an iterator, You need to copy the set of select every time! While poll has fixed this kind of problem to have beautiful code. Another difference is that poll can handle more than 1024 file descriptors (FDs) by default. poll can handle different events to make the program more readable instead of having a lot of variables to handle this kind of job. Operations in poll and select is linear and slow because of having a lot of checks.

Amir Fo
  • 5,163
  • 1
  • 43
  • 51