11

Quoting from this socket tutorial:

Sockets come in two primary flavors. An active socket is con­nect­ed to a remote active socket via an open data con­nec­tion... A passive socket is not con­nect­ed, but rather awaits an in­com­ing con­nec­tion, which will spawn a new active socket once a con­nec­tion is es­tab­lished ...

Each port can have a single passive socket binded to it, await­ing in­com­ing con­nec­tions, and mul­ti­ple active sockets, each cor­re­spond­ing to an open con­nec­tion on the port. It's as if the factory worker is waiting for new mes­sages to arrive (he rep­re­sents the passive socket), and when one message arrives from a new sender, he ini­ti­ates a cor­re­spon­dence (a con­nec­tion) with them by del­e­gat­ing someone else (an active socket) to ac­tu­al­ly read the packet and respond back to the sender if nec­es­sary. This permits the factory worker to be free to receive new packets. ...

Then the tutorial explains that, after a connection is established, the active socket continues receiving data until there are no remaining bytes, and then closes the connection.

What I didn't understand is this: Suppose there's an incoming connection to the port, and the sender wants to send some little data every 20 minutes. If the active socket closes the connection when there are no remaining bytes, does the sender have to reconnect to the port every time it wants to send data? How do we persist a once established connection for a longer time? Can you tell me what I'm missing here?

My second question is, who determines the limit of the concurrently working active sockets?

onmyway133
  • 45,645
  • 31
  • 257
  • 263
aslı
  • 8,740
  • 10
  • 59
  • 80
  • 1
    You are paraphrasing that article and taking bits and pieces from different sections of the article. The contexts are different. In the last section the author is explaining his program. Sockets do not act like that by default, in fact if you forget to close your socket bad things can and will happen. The socket doesn't automagically close when it's received the last byte. – SRM Jan 14 '11 at 22:58
  • OK, I thought that's the convention and just asked what I'm missing here. I'm new to the concepts and that's why I want to question everything I find hard to understand. – aslı Jan 14 '11 at 23:06
  • No problem, I just wanted to make sure you understand that you must explicitly close the socket. It might save you some headaches down the line when you are scratching your head trying to figure out why the socket didn't close :). – SRM Jan 14 '11 at 23:10

3 Answers3

9

The sender should send a KEEPALIVE packet at regular intervals to keep the connection alive. The format of the KEEPALIVE depends on the protocol. It could be as small as a single NULL in the TCP data segment.

As to the second question... it depends on the I/O. If it is blocking I/O then you only want a certain number of threads running on your computer, so you won't be able to have many clients. If it's non-blocking, you can have a lot more clients. Programming languages should have support for both blocking and non-blocking I/O. (I know for a fact that Java does.)

It also depends on things like bandwidth, the data transfer for each client, memory, clock speed, etc. But non-blocking vs. blocking can make a huge difference in the number of clients you can accept. You probably can't have more than 5-10 clients blocking without your server crashing... but you can have thousands if you're not blocking.

ktm5124
  • 11,861
  • 21
  • 74
  • 119
  • So is continuously sending these keepalive packets for 20 minutes a cheaper operation than re-establishing the connection every time? What would be the advantages of keeping the connection alive, besides escaping the overhead of re-connecting? – aslı Jan 14 '11 at 22:58
  • 1
    Actually if you will have a lot of clients connecting and each request will be quick (20 seconds like you said) then it's best to use a request/response type pattern. You only have 64k ports available which means only 64k sockets can be accepted on an IP before you get port exhaustion unless those sockets are closed. It really depends on your application though. If you are writing a MMO for example, you need persistent connections. If you are writing a web server though you won't need (and will try to avoid) persistent connections (unless you are utilizing HTML5 but that's a dif. story). – SRM Jan 14 '11 at 23:04
  • 2
    From the client's perspective, a simple keepalive to a single server isn't a lot of work. You're probably already sending half a dozen keepalives regularly when you're connected to the internet. From the server's perspective, it might be advantageous to keep the number of sockets in its socket list at a minimum, to improve I/O performance. If it's as long as a 20 minute wait between data transfers, I would create a new connection every time. It's negligible for the client, but not for the server. – ktm5124 Jan 14 '11 at 23:07
  • @ktm5124 +1 for pointing out that's it's much more expensive for the server to create a connection than the client. Multiplied by hundreds of clients and it becomes a bottleneck. – SRM Jan 14 '11 at 23:12
  • Got it. One more thing I want to make clear: if I have 64k ports available and if I can spawn more than one active sockets for each port (if I have different incoming connections from the same port), theoretically I can accept more than 64k sockets, right? – aslı Jan 14 '11 at 23:16
  • 1
    You can only accept up to 64k connections *concurrently* so as long as some of the clients drop off you will not reach port exhaustion. Also, only passive sockets can accept and you can only have one passive socket associated with an ip/port pair so you can only listen with one socket but many sockets can connect. – SRM Jan 14 '11 at 23:21
  • Oh then I explained myself wrong, What I meant to say was, suppose I'm starting a thread to perform foo() whenever a passive socket accepts a connection. Since there's one passive socket for each port, I can have max. 64k sockets simultaneously, but maybe more than 64k threads performing foo(), depending on when the foo()s end. Now it's more clear, thx guys :] – aslı Jan 14 '11 at 23:28
  • 1
    Ah, yes, what you said is absolutely correct. That's where a queuing mechanism and a threadpool comes in handy. You can also use a variant of the command pattern in this way where your command handlers are the response handlers and your command pattern is implemented using a threadpool. – SRM Jan 15 '11 at 05:43
  • @SRM "You only have 64k ports available which means only 64k sockets can be accepted on an IP". As the OP quoted, 1 port has 1 passive socket and many active sockets, with each active socket for each connection. Why do you mention 64k port here ? – onmyway133 Apr 26 '13 at 08:51
  • @entropy I was talking about TCP/IP port exhaustion and the fact that when you accept a socket it get's assigned a unique random port number. That's part of the TCP/IP protocol - the address and port uniquely identify that connection. Since the port is 16 bit unsigned number the largest number of open ports (hence open connections on a single listening port) is 65535. High performance servers do tricks like zero copy of sockets and such to get around it, but it's a hard limit in the protocol. – SRM Apr 26 '13 at 17:10
7

Please do not confuse actual packets sent by TCP/IP implementation over network and interaction between your program and a library that implements TCP/IP.

The socket is just an abstraction that is presented to your program by TCP/IP implementation (library or kernel OS). You may visualize socket as connection to the pipe (localIP:port-remoteIP:port). Your program opens socket, communicates data over socket and may close the socket if no longer needed to help free resources. This is normal flow. However TCP/IP implementation may close socket for its own valid reasons. Some of those reasons: network access cable disconnection, network routing errors, server went down, etc. Thus your program may find tcp/ip socket closed even if it did not close it.

Now your first question, what do I do if my program sends small data segments with long pause between them. The answer is: depends on how long is the pause and what program listens you on other side. Most TCP/IP implementations have a notion of connection time out to provide you abstractions of reliable connection over real unreliable networks. Thus if your program will pause longer than tcp/ip timeout you will find your socket been closed by the library and you will need to re-open socket. That may also cause your to restart communication over again, depends on a program that listens for you on other side of the tcp/ip connection pipe.

There are ways of increasing tcp/ip timeout and keeping it alive. Those may be done as part of network configuration, the server software configuration on other end or by you explicitly asking to keep socket open by setting KEEPALIVE parameters in your tcp/ip library call. Would it be still open or not depends. The full details of how tcp/ip would keep socket open should not confuse you as it has nothing to do with your code. TCP/IP has many settings and different timeouts to provide your program with illusion of stable reliable connection. The good part it is all hidden from your program code as long as you not abuse it. Keep your pause under few seconds :) One set of timeout settings may work well for small applications in reliable local network and will not work for high load applications or over cross-continent connectivity. Each specific situation has its own solution, often more than just one.

In this specific question "to send some little data every 20 minutes" I would advise you to close and open socket connection for each communication. The time to open one is less than a second and should not impact your communication. In return you get less complexity in your communication protocol. Receiver is always starts fresh on new socket connection and both systems may enjoy free resources in tcp/ip communication over all 20 minutes when you don't need it.

smile-on
  • 2,073
  • 1
  • 20
  • 20
0

First question: Yes, once a socket is closed you must do an Open to re-initiate communication.

Second Question: You do. If you want you can create 64k connections to your server and suffer port exhaustion (I don't recommend that). Like ktm5124 stated, it all depends on your application. There are several different ways to make your server scalable including using async I/O and or a thread pool to handle client requests.

SRM
  • 1,377
  • 1
  • 10
  • 17
  • Please see Will 's answer http://stackoverflow.com/a/2332756/1418457 maybe you misunderstand something about TCP – onmyway133 May 02 '13 at 11:19
  • Okay, if tcp/ip port exhaustion is a client side thing only (as intimated by the 64k limit per client per server port), then why is there a real world problem of port exhaustion on servers? You can still use up all values of the tuple - maybe my 64k number is wrong, but there is a hard limit to the number of combinations that tuple can hold. When you run out of combinations, unless you are reusing address, you will run out of addresses (unique tuple really) to assign the incomming connection and the connection is refused. – SRM May 02 '13 at 18:59