I'm trying to learn how to use epoll() for tcp server application, 'cause i'm expecting many connections. i tried checking samples and tutorials, they always recommend using/setting sockets that are added in epoll() to be NON-BLOCKING sockets. why?
-
You can’t do multiple blocking reads at the same time on one thread. – Ry- Oct 09 '14 at 02:44
-
1Did you read the man page? http://man7.org/linux/man-pages/man7/epoll.7.html – Kijewski Oct 09 '14 at 02:45
-
1possible duplicate of [Why having to use non-blocking fd in a edge triggered epoll function?](http://stackoverflow.com/questions/14643249/why-having-to-use-non-blocking-fd-in-a-edge-triggered-epoll-function) – Michael Petch Oct 09 '14 at 02:47
-
With blocking I/O, all it takes is one misbehaving client to cause a denial of service to all clients. For example, if someone connects with a client that sends half of a command but never sends the second half (but keeps the TCP connection open indefinitely), and the server blocks inside `recv()` waiting for the second half of the command that never arrives, then the server is hung for an indefinite amount of time and no other clients will get their expected responses. – Jeremy Friesner Nov 16 '22 at 15:35
2 Answers
For level-triggered epoll, nonblocking sockets can help to minimize epoll_wait() calls, its an optimization issue.
For edge-triggered epoll, you MUST use nonblocking sockets AND call read() or write() until they return EWOULDBLOCK. If you don't, you can miss kernel notifications.
You can find a detailed answer here: https://eklitzke.org/blocking-io-nonblocking-io-and-epoll

- 1,080
- 14
- 12
It's a good question and not duplicated. Recently I also find a tutorial using nonblocking socket in select
(select
is level-triggered only), which causes me to think.
The question is:
Why using nonblocking IO or set fd
to nonblicking, in level-triggered epoll
, select
or other similar interfaces?
There are in fact very solid reasons for this case.
Cite from the book The Linux Programming Interface :
63.1.2 Employing Nonblocking I/O with Alternative I/O Models
Nonblocking I/O (the
O_NONBLOCK
flag) is often used in conjunction with the I/O models described in this chapter. Some examples of why this can be useful are the following:
- As explained in the previous section, nonblocking I/O is usually employed in conjunction with I/O models that provide edge-triggered notification of I/O events.
- If multiple processes (or threads) are performing I/O on the same open file descriptions, then, from a particular process’s point of view, a descriptor’s readiness may change between the time the descriptor was notified as being ready and the time of the subsequent I/O call. Consequently, a blocking I/O call could block, thus preventing the process from monitoring other file descriptors. (This can occur for all of the I/O models that we describe in this chapter, regardless of whether they employ level-triggered or edge-triggered notification.)
- Even after a level-triggered API such as
select()
orpoll()
informs us that a file descriptor for a stream socket is ready for writing, if we write a large enough block of data in a singlewrite()
orsend()
, then the call will nevertheless block.- In rare cases, level-triggered APIs such as
select()
andpoll()
can return spurious readiness notifications—they can falsely inform us that a file descriptor is ready. This could be caused by a kernel bug or be expected behavior in an uncommon scenario.
First, let's check case #2: "If multiple processes (or threads) are performing I/O on the same open file descriptions...".
Read this code from libevent introduction, http://www.wangafu.net/~nickm/libevent-book/01_intro.html .
/* For sockaddr_in */
#include <netinet/in.h>
/* For socket functions */
#include <sys/socket.h>
/* For fcntl */
#include <fcntl.h>
/* for select */
#include <sys/select.h>
#include <assert.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#define MAX_LINE 16384
char
rot13_char(char c)
{
/* We don't want to use isalpha here; setting the locale would change
* which characters are considered alphabetical. */
if ((c >= 'a' && c <= 'm') || (c >= 'A' && c <= 'M'))
return c + 13;
else if ((c >= 'n' && c <= 'z') || (c >= 'N' && c <= 'Z'))
return c - 13;
else
return c;
}
struct fd_state {
char buffer[MAX_LINE];
size_t buffer_used;
int writing;
size_t n_written;
size_t write_upto;
};
struct fd_state *
alloc_fd_state(void)
{
struct fd_state *state = malloc(sizeof(struct fd_state));
if (!state)
return NULL;
state->buffer_used = state->n_written = state->writing =
state->write_upto = 0;
return state;
}
void
free_fd_state(struct fd_state *state)
{
free(state);
}
void
make_nonblocking(int fd)
{
fcntl(fd, F_SETFL, O_NONBLOCK);
}
int
do_read(int fd, struct fd_state *state)
{
char buf[1024];
int i;
ssize_t result;
while (1) {
result = recv(fd, buf, sizeof(buf), 0);
if (result <= 0)
break;
for (i=0; i < result; ++i) {
if (state->buffer_used < sizeof(state->buffer))
state->buffer[state->buffer_used++] = rot13_char(buf[i]);
if (buf[i] == '\n') {
state->writing = 1;
state->write_upto = state->buffer_used;
}
}
}
if (result == 0) {
return 1;
} else if (result < 0) {
if (errno == EAGAIN)
return 0;
return -1;
}
return 0;
}
int
do_write(int fd, struct fd_state *state)
{
while (state->n_written < state->write_upto) {
ssize_t result = send(fd, state->buffer + state->n_written,
state->write_upto - state->n_written, 0);
if (result < 0) {
if (errno == EAGAIN)
return 0;
return -1;
}
assert(result != 0);
state->n_written += result;
}
if (state->n_written == state->buffer_used)
state->n_written = state->write_upto = state->buffer_used = 0;
state->writing = 0;
return 0;
}
void
run(void)
{
int listener;
struct fd_state *state[FD_SETSIZE];
struct sockaddr_in sin;
int i, maxfd;
fd_set readset, writeset, exset;
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = 0;
sin.sin_port = htons(40713);
for (i = 0; i < FD_SETSIZE; ++i)
state[i] = NULL;
listener = socket(AF_INET, SOCK_STREAM, 0);
make_nonblocking(listener);
#ifndef WIN32
{
int one = 1;
setsockopt(listener, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one));
}
#endif
if (bind(listener, (struct sockaddr*)&sin, sizeof(sin)) < 0) {
perror("bind");
return;
}
if (listen(listener, 16)<0) {
perror("listen");
return;
}
FD_ZERO(&readset);
FD_ZERO(&writeset);
FD_ZERO(&exset);
while (1) {
maxfd = listener;
FD_ZERO(&readset);
FD_ZERO(&writeset);
FD_ZERO(&exset);
FD_SET(listener, &readset);
for (i=0; i < FD_SETSIZE; ++i) {
if (state[i]) {
if (i > maxfd)
maxfd = i;
FD_SET(i, &readset);
if (state[i]->writing) {
FD_SET(i, &writeset);
}
}
}
if (select(maxfd+1, &readset, &writeset, &exset, NULL) < 0) {
perror("select");
return;
}
if (FD_ISSET(listener, &readset)) {
struct sockaddr_storage ss;
socklen_t slen = sizeof(ss);
int fd = accept(listener, (struct sockaddr*)&ss, &slen);
if (fd < 0) {
perror("accept");
} else if (fd > FD_SETSIZE) {
close(fd);
} else {
make_nonblocking(fd);
state[fd] = alloc_fd_state();
assert(state[fd]);/*XXX*/
}
}
for (i=0; i < maxfd+1; ++i) {
int r = 0;
if (i == listener)
continue;
if (FD_ISSET(i, &readset)) {
r = do_read(i, state[i]);
}
if (r == 0 && FD_ISSET(i, &writeset)) {
r = do_write(i, state[i]);
}
if (r) {
free_fd_state(state[i]);
state[i] = NULL;
close(i);
}
}
}
}
int
main(int c, char **v)
{
setvbuf(stdout, NULL, _IONBF, 0);
run();
return 0;
}
This is not an example of multiple processes (or threads) performing I/O on the same open file descriptions, but it demostrates the same idea.
In the do_read
function, it uses recv
in side a while(1)
to read as many bytes as possible, but 1024
bytes for each recv
. I guess this is a typical pattern.
So you need nonblocking here, otherwise recv
will eventually block when there's no data in network input.
For #3, if you write too much data in a blocking socket and there's no enough buffer. send
will block until all data are sent. And it could block for long enough time if there's no enough space in the send buffer. More details check https://stackoverflow.com/a/74172742/5983841 .

- 7,007
- 2
- 49
- 79