How to use SO_KEEPALIVE option properly to detect that the client at the other end is down?

Question

I was trying to learn the usage of option SO_KEEPALIVE in socket programming in C language under Linux environment.

I created a server socket and used my browser to connect to it. It was successful and I was able to read the GET request, but I got stuck on the usage of SO_KEEPALIVE.

I checked this link keepalive_description@tldg.org but I could not find any example which shows how to use it.

As soon as I detect the client's request on accept() function I set the SO_KEEPALIVE option value 1 on the client socket. Now I don't know, how to check if the client is down, how to change the time interval between the probes sent etc.

I mean, how will I get the signal that the client is down? (Without reading or writing at the client - I thought I will get some signal when probes are not replied back from client), how should I program it after setting the option SO_KEEPALIVE on).

Also if suppose the probes are sent every 3 secs and the client goes down in between I will not get to know that client is down and I may get SIGPIPE.

Anyways importantly I wanna know how to use SO_KEEPALIVE in the code.

score 22 · Accepted Answer · answered Mar 25 '11 at 16:42

22

To modify the number of probes or the probe intervals, you write values to the /proc filesystem like

 echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time
 echo 60 > /proc/sys/net/ipv4/tcp_keepalive_intvl
 echo 20 > /proc/sys/net/ipv4/tcp_keepalive_probes

Note that these values are global for all keepalive enabled sockets on the system, You can also override these settings on a per socket basis when you set the setsockopt, see section 4.2 of the document you linked.

You can't "check" the status of the socket from userspace with keepalive. Instead, the kernel is simply more aggressive about forcing the remote end to acknowledge packets, and determining if the socket has gone bad. When you attempt to write to the socket, you will get a SIGPIPE if keepalive has determined remote end is down.

answered Mar 25 '11 at 16:42

bdk

4,769
29
33

10

You'll get notified when the status changes when you read from the socket. If the peer is determined to be dead due to the keepalives, select/poll will notifiy the socket as readable, and a read()/recv() will return an error. You should anyway always read or monitor a socket for reading though. – nos Mar 25 '11 at 16:45
@bdk: thanks for the reply. I get a SIGPIPE even without setting the SO_KEEPALIVE on. So what is the purpose in the end, I mean it should somehow notify me earlier, how can it expect the client to take care of closing the socket propelrly. I mean its all about "BEING STRICT IN WHAT WE SEND and BEING TOLERANT IN WHAT WE RECIEVE (I forgot on which link I read this line:) )" – Durin Mar 25 '11 at 16:47
1

Yes, I think as far as writing, it just means you can get a SIGPIPE earlier, since the kernel detects the failiure sooner. If you are reading from the socket, it also allows read() to return an error instead of just 'no data available' as per @nos comment above. Getting the read event w. error on read from the select loop would probably be the best "early notice" from SO_KEEPALIVE – bdk Mar 25 '11 at 16:51
@nos: okay so in the end everything boils down to reading from the socket to determine if its alive. ah!! actually the problem is : suppose I am sending a huge file to the client and it gets disconnected in between then I will get SIGPIPE when I write on the socket. the job of the code is to to continuously write on the socket, if I bother about checking the client being alive at each write() by reading from the socket then that would hamper the performance. Please correct me if I am wrong somewhere, thanks :) – Durin Mar 25 '11 at 16:54
Take a look at this question: http://stackoverflow.com/questions/108183/how-to-prevent-sigpipes-or-handle-them-properly. You should be able to disable or ignore the SIGPIPE signal and then get an error back from write if the SO_KEEPALIVE timesout from my understanding – bdk Mar 25 '11 at 17:02
@Anirudh Tomer The common way is to ignore SIGPIPE. (signal(SIGPIPE,SIG_IGN); ), that way a write() call will return an error instead of delivering a SIGPIPE. If all you're doing is waiting for data, you'd certainly want to turn on keepalives though, otherwise you might never detect a dead peer. – nos Mar 25 '11 at 17:05
@nos: now I just wrote a code...a part of it is `printf("%d\n",read(clientsockfd,buff,BUFSIZ)); sleep(5); x = read(clientsockfd,buff,BUFSIZ); printf("%d--%s\n",x,strerror(errno)); sleep(3); x = read(clientsockfd,buff,BUFSIZ); printf("%d--%s\n",x,strerror(errno)); ` In the first case I set keep alive on and the second case not setting keep alive on. as soon as I send the request from browser I close the tab. Output is same for both cases `408 0--Success 0--Success ` i.e no error on reading from a dead peer, even after setting keep alive on. It rather gives 0 bytes read. – Durin Mar 25 '11 at 17:15
1

Note that keepalive won't detect a failure until at least the configured keepalive_time + (keepalive_intrvl*keepalive_probes). I think by default if you don't change the settings this can default to over an hour! – bdk Mar 25 '11 at 17:50
4

@nirudh Tomer The default keepalive probes are sent every 2 hours, and the peer is determinned dead if 9 probes with 75 seconds inbetween them all fail. How have you adjusted the defaults ? Note that tcp keepalives are not particularly designed to detect fast deaths of peers. If the peer is still reachable, send will fail, and that's ok, you detected that it died! If somethings wrong with the network, fast detecting of a dead peer is not easy, nor usually desirable. – nos Mar 25 '11 at 18:06
1

... If you really need a reliable way to detect this fast, you need to send heartbeats at the application level, and use some sensible timeouts on your reads/writes (note,that there's no free lunch there, this always ends up with a lot of code to cover a lot of corner cases) – nos Mar 25 '11 at 18:07
1

@Everyone: thanks all for helping me out, I now understand the purpose of keep alive. Also to detect a dead peer while writing into socket-fd I will handle sigpipe gracefully or otherwise override it with sig_ign. I had not adjusted the defaults actually, now it works. I also checked the book by richard stevens on networking to understand it more. – Durin Mar 26 '11 at 17:29

score 11 · Answer 2 · answered Mar 26 '11 at 14:28

You'll get the same result if you enable SO_KEEPALIVE, as if you don't enable SO_KEEPALIVE - typically you'll find the socket ready and get an error when you read from it.

You can set the keepalive timeout on a per-socket basis under Linux (this may be a Linux-specific feature). I'd recommend this rather than changing the system-wide setting. See the man page for tcp for more info.

Finally, if your client is a web browser, it's quite likely that it will close the socket fairly quickly anyway - most of them will only hold keepalive (HTTP 1.1) connections open for a relatively short time (30s, 1 min etc). Of course if the client machine has disappeared or network down (which is what SO_KEEPALIVE is really useful for detecting), then it won't be able to actively close the socket.

In Windows you can use [WSAIoctl](http://msdn.microsoft.com/en-us/library/windows/desktop/dd877220%28v=vs.85%29.aspx) to configure per-socket TCP keepalive settings. — Jarek Przygódzki, Apr 29 '14 at 19:14

score 4 · Answer 3 · answered Sep 04 '12 at 01:29

As already discussed, SO_KEEPALIVE makes the kernel more aggressive about continually verifying the connection even when you're not doing anything, but does not change or enhance the way the information is delivered to you. You'll find out when you try to actually do something (for example "write"), and you'll find out right away since the kernel is now just reporting the status of a previously set flag, rather than having to wait a few seconds (or much longer in some cases) for network activity to fail. The exact same code logic you had for handling the "other side went away unexpectedly" condition will still be used; what changes is the timing (not the method).

Virtually every "practical" sockets program in some way provides non-blocking access to the sockets during the data phase (maybe with select()/poll(), or maybe with fcntl()/O_NONBLOCK/EINPROGRESS&EWOULDBLOCK, or if your kernel supports it maybe with MSG_DONTWAIT). Assuming this is already done for other reasons, it's trivial (sometimes requiring no code at all) to in addition find out right away about a connection dropping. But if the data phase does not already somehow provide non-blocking access to the sockets, you won't find out about the connection dropping until the next time you try to do something.

(A TCP socket connection without some sort of non-blocking behaviour during the data phase is notoriously fragile, as if the wrong packet encounters a network problem it's very easy for the program to then "hang" indefinitely, and there's not a whole lot you can do about it.)

Is there a way to access this "previously set flag" without trying to do an actual write? — Chance, Mar 01 '13 at 00:10

score 4 · Answer 4 · answered Jun 22 '18 at 13:14

Short answer, add

int flags =1;
if (setsockopt(sfd, SOL_SOCKET, SO_KEEPALIVE, (void *)&flags, sizeof(flags))) { perror("ERROR: setsocketopt(), SO_KEEPALIVE"); exit(0); };

on the server side, and read() will be unblocked when the client is down.

A full explanation can be found here.

How to use SO_KEEPALIVE option properly to detect that the client at the other end is down?

4 Answers4

Linked