0

UPDATE : After investigating lil more I found the real problem for this behavior . Problem is, I am creating the threads for each connection and passing the sock fd to the thread but was not pthraed_joining immediately so that made my main thread not to able to create any more threads after the connection acceptance. and my logic of closing the socket is in child thread, coz of that i was not able to close the socket and hence they were going to WAIT CLOSE state. SO I just detached the threads after creating them and all works well as of now !!

I have a client server program, I am using a script to run the client and make as many as connections possible and close them after sending a line of data and exit the client, every thing works fine until 32739 th connection i.e. connection is closed on both the sides and all but after that number the connection is not getting closed and server stops taking any more connections and if do

netstat -tonpa 2>&1 | grep CLOSE

I see around 1020 sockets waiting for CLOSE. sample out of the command,

tcp 25 0 192.168.0.175:16099 192.168.0.175:41704 CLOSE_WAIT 5250/./bl_manager off (0.00/0/0)
tcp 24 0 192.168.0.175:16099 192.168.0.175:41585 CLOSE_WAIT 5250/./bl_manager off (0.00/0/0)
tcp 30 0 192.168.0.175:16099 192.168.0.175:41679 CLOSE_WAIT 5250/./bl_manager off (0.00/0/0)
tcp 31 0 192.168.0.175:16099 192.168.0.175:41339 CLOSE_WAIT 5250/./bl_manager off (0.00/0/0)
tcp 25 0 192.168.0.175:16099 192.168.0.175:41760 CLOSE_WAIT 5250/./bl_manager off (0.00/0/0)

I am using following code to detect the client disconnection.

for(fd = 0; fd <= fd_max; fd++) {
    if(FD_ISSET(fd, &testfds)) {
       if (fd == client_fd) {
           ioctl(fd, FIONREAD, &nread);
           if(nread == 0) {
               FD_CLR(fd, &readfds);
               close(fd);
               return 0;
           }
       }
    }
} /* for()*/

Please do let me know if am doing anything wrong. Its a Python client and CPP server setup.

thank you

bana
  • 397
  • 1
  • 5
  • 16
  • What platform are you on? Also, why is this tagged Python? Do you have any reason to believe the client is doing something devious with sockets that prevents your server from working? – abarnert Jul 12 '13 at 23:53
  • Yes I was doubting about that ! – bana Jul 13 '13 at 20:29

2 Answers2

2

CLOSE-WAIT means the port is waiting for the local application to close the socket, having already received a close from the peer. Clearly you are leaking sockets somehow, possibly in an error path.

Your code to 'detect client disconnection' is completely incorrect. All you are testing is the amount of data that can be read without blocking, i.e. that has already arrived. The correct test is a return value of zero from recv() or an error other than EAGAIN/EWOULDBLOCK when reading or writing.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • Is not that ioctl used to see if there is any data in the socket to read or not, without actually reading it. As i know it returns 0 if the socket is closed on the other side, please do let me know if my understanding is wrong. – bana Jul 13 '13 at 20:33
  • Your 'understanding' is wrong, indeed baseless. Read what I wrote again, or see the documentation of FIONREAD: 'returns the number of bytes that are immediately available for reading', 'returns the number of bytes in the input buffer', etc. etc. Not a word about peer disconnects anywhere. – user207421 Jul 14 '13 at 02:12
  • http://stackoverflow.com/questions/283375/detecting-tcp-client-disconnect If this is wrong, we should definitely tell those guys on the above thread. But for me it's working properly as expected ! it detects the connection closer by the client. – bana Jul 15 '13 at 17:11
  • No it doesn't. It detects when there is no data to be read without blocking. If the client doesn't send you anything for five minutes, that isn't a disconnect, is it? And primary sources please: nothing at StackOverflow can possibly override the actual documentation, which I have quoted here. You can't cite SO answers in refutation of official documents. The accepted answer in the thread you cite is just as wildly incorrect as you are, as I have now said in a comment. – user207421 Jul 16 '13 at 01:08
1

Without knowing your platform, I can't be sure, but the fact that you're clearly using select, and you're having a problem only a few dozen away from 32768, it seems very likely that this is your problem.

An fd_set is a collection of bits, indexed by file descriptor numbers. Every platform has a different max number. OpenBSD and recent versions of FreeBSD and OS X usually limit fd_set to an FD_SETSIZE that defaults to 1024. Different linux boxes seem to have 1024, 4096, 32768, and 65536.

So, what happens if you FD_ISSET(32800, &testfds) and FD_SETSIZE is 32768? You're asking it to read a bit from arbitrary memory.

A select or other call before this should give you an EINVAL error when you pass in 32800 for the nfds parameter… but historically, many platforms have not done so. Or they have returned an error, but only after filling in the first FD_SETSIZE bits properly and leaving the rest set to uninitialized memory, which means if you forget to check the error, your code seems to work until you stress it.

This is one of the reasons using select for more than a few hundred sockets is a bad idea. The other reason is that select is linear (and, worse, not linear on the number of current sockets, but linear on the highest fd, so even after most clients go away it's still slow).

Most modern platforms that have select also have poll, which avoids that problem.

Unless you're on Windows… in which case there are completely different reasons not to use select, and different answers.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Thanks a lot for the reply ! after investigating a little more I found the reason to be different for my prob. I totally get what you said and I was also thinking the same.But the thing is those threads are so short that within milliseconds they close the socket and die . so the socket fd is reused.As I think I have found solution to my problem I will try to update the question ! – bana Jul 13 '13 at 20:40