3

I wrote a simple tcp server application, where my fd_set for read includes the connection socket descriptor. The server application, simply sends an ACK, whenever it receives a message. The client application only sends the next message, after it receives an ACK from server.

// timeval == NULL
select(maxfd, &read_set, NULL, NULL, NULL)

When I do this, the performance is about 3K messages/sec. The latency between sending an ack and receiving a response from client is 0.3ms.

// tm.tv_sec=0 and tm.tv_usec=0
select(maxfd, &read_set, NULL, NULL, tm)

But if I do this, the performance goes to 8K messages/sec and latency drops to 0.18ms.

In the latter case, select becomes a poll. Can someone please explain why the latter case performs so much better than the first case?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Jimm
  • 8,165
  • 16
  • 69
  • 118
  • Check out [epoll](http://linux.die.net/man/7/epoll) if you are worried about latencies. As for the differences, are they consistent? Maybe you can read the libc/kernel source to see what it does differently. – Some programmer dude Dec 18 '12 at 16:20
  • Like just about everything in engineering, it's a tradeoff. From a *system* standpoint, "select (..., NULL)" is the clear win in almost every case: it responds promptly; it doesn't hog system resources. In your case, however, "select (..., tm=0)" performs better. Why? 1) because you have a *continuous* stream of network traffic, and 2) responding to the network is all you care about. – paulsm4 Dec 18 '12 at 16:43

3 Answers3

4

Possible answer

When the timeout is zero the select() call returns immediately when there's no data available. This allows you to busy wait poll the socket, actively burning up CPU cycles until data arrives.

When timeout is NULL then if there's no data your process will be put to sleep in the WAIT_INTERRUPTIBLE state waiting for data to become available. This incurs a penalty of at least two context switches, one away from your process and one back to it when data becomes available. The advantage of this is that your process gives up the CPU and allows other processes to run.

It's like comparing spinlocks and semaphores. A spinlock "spins" the CPU waiting for a condition whereas a semaphore yields the CPU. Spinlocks are more performant but they hog the CPU so they should only be used for very, very short waits. Semaphores are more cooperative with other processes but they incur noticeable overhead due to the extra context switches.

Community
  • 1
  • 1
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • The cost of a context switch is few nanoseconds. 66ns on mine. given 8k messages per second, this does not add up – Jimm Dec 18 '12 at 16:47
  • @Jimm How do you measure context switch time? Googling shows other folks measuring context switch times on the order of 5-30us. – John Kugelman Dec 18 '12 at 16:52
  • I used https://github.com/tsuna/contextswitch/blob/master/timectxsw.c. Can you share me other links, i can try those to see the difference – Jimm Dec 18 '12 at 17:00
  • even with 30 microsecond, with 2 context switches, i.e. 60us, every wake up should incur additional 60us. In my case, even wake up is about .1 - .18 ms later – Jimm Dec 18 '12 at 17:02
1

It will not answer to your question, but if you really wants good performance, and if the rate of received message are quite high you could try that :

  • Open with O_NONBLOCK
  • First try to read
  • If read failed with error code EAGAIN or EWOULDBLOCK, do the select with timeval == NULL
  • Process data
benjarobin
  • 4,410
  • 27
  • 21
-2

the man page for select(2) says

timeout is an upper bound on the amount of time elapsed before select() returns. If both fields of the timeval stucture are zero, then select() returns immediately. (This is useful for polling.) If timeout is NULL (no timeout), select() can block indefinitely.

emphasis added is mine. If you're concerned about latency, you should look at epoll(4).

Sam Miller
  • 23,808
  • 4
  • 67
  • 87