27

I've read several posts about java.net vs java.nio here on StackOverflow and on some blogs. But I still cannot catch an idea of when should one prefer NIO over threaded sockets. Can you please examine my conclusions below and tell me which ones are incorrect and which ones are missed?

  • Since in threaded model you need to dedicate a thread to each active connection and each thread takes like 250Kilobytes of memory for it's stack, with thread per socket model you will quickly run out of memory on large number of concurrent connections. Unlike NIO.

  • In modern operating systems and processors a large number of active threads and context switch time can be considered almost insignificant for performance

  • NIO throughoutput can be lower because select() and poll() used by asynchronous NIO libraries in high-load environments is more expensive than waking up and putting to sleep threads.

  • NIO has always been slower but it allows you to process more concurrent connections. It's essentially a time/space trade-off: traditional IO is faster but has a heavier memory footprint, NIO is slower but uses less resources.

  • Java has a hard limit per concurrent threads of 15000 / 30000 depending on JVM and this will limit thread per connection model to this number of concurrent connections maximum, but JVM7 will have no such limit (cannot confirm this data).

So, as a conclusion, you can have this:

  • If you have tens of thousands concurrent connections - NIO is a better choice unless a request processing speed is a key factor for you
  • If you have less than that - thread per connection is a better choice (given that you can afford amount of RAM to hold stacks of all concurrent threads up to maximum)
  • With Java 7 you may want to go over NIO 2.0 in either case.

Am I correct?

Vladislav Rastrusny
  • 29,378
  • 23
  • 95
  • 156

5 Answers5

3

That seems right to me, except for the part about Java limiting the number of threads – that is typically limited by the OS it's running on (see How many threads can a Java VM support? and Can't get past 2542 Threads in Java on 4GB iMac OSX 10.6.3 Snow Leopard (32bit)).

To reach that many threads you'll probably need to adjust the stack size of the JVM.

Community
  • 1
  • 1
Adam Bryzak
  • 2,608
  • 2
  • 19
  • 7
  • 1
    Another thing to note is if you're using all those connections to talk to a server (i.e. you're the client side of the connection), you're limited to about 65,000 connections due to the number of local ports available. – Adam Bryzak Mar 25 '11 at 20:31
  • 1
    No, you are not. Each client connects to the same port. BTW, I take those threads as there is no hard limit? What Paul Tuma says here then: http://paultyma.blogspot.com/2008/03/writing-java-multithreaded-servers.html – Vladislav Rastrusny Mar 25 '11 at 20:38
  • 1
    They connect to the same port on the server side, but client connections still need to allocate a port locally to receive a response from the server. – Adam Bryzak Mar 25 '11 at 20:42
  • This cannot be true because in this case these will be the lies: http://groovy.dzone.com/articles/512000-concurrent-websockets | http://www.coversant.net/product/soapboxserver.aspx : they both tell about concurrent connections on a single machine. – Vladislav Rastrusny Mar 25 '11 at 20:46
  • If you set up a machine with multiple IPs (either through virtual interfaces or physical NICs), each of those will be able to have ~65,000 client connections, sorry for not making this clearer in my first comment. That first link you posted mentions their client program opens 64,000 connections as well. – Adam Bryzak Mar 25 '11 at 20:53
  • I think this is wrong. As I know a connection in TCP is defined by a quadruple {host1, port1; host2, port2}. If you open a listening socket, no local port is assigned to handle connecting client. But if you open an outgoing connection to a remote server, indeed a local port is assigned to handle newly opened connection. So, they opened 64000 connections from each machine for not going out of the number of local ports available which support connection from the client's side – Vladislav Rastrusny Mar 25 '11 at 21:08
  • 2
    @FractalizeR: It is correct. Each outgoing connection needs a unique port number in current TCP implementations. In theory it is possible for a TCP/IP stack to take notice of the tuple when allocating the outgoing port; in practice it is impossible due to the API: specifically the fact that bind() happens before connect(), whether explicitly or implictly. – user207421 Mar 28 '11 at 10:02
  • Theoretically you could use the same local port for many outbound connections but I'm not sure the API is there for it. TCP does use the local AND remote port to decide what socket is being used. I believe what people are doing instead is to create multiple local IP addresses on the same computer and distributing the connections over all of those (it may also require some virtual local network interfaces). – Dobes Vandermeer Mar 24 '12 at 12:43
  • 2
    @DobesVandermeer The API *isn't* 'there for it', for the reason I gave. – user207421 Apr 10 '12 at 03:47
2

I still think the context switch overhead for the threads in traditional IO is significant. At a high level, you only gain performance using multiple threads if they won't contend for the same resources as much, or they spend time much higher than the context switch overhead on the resources. The reason for bringing this up, is with new storage technologies like SSD, your threads come back to contend on the CPU much quicker

byte_array
  • 2,767
  • 1
  • 16
  • 10
  • If your app is network I/O bound, though, as with an HTTP client or server, then all the "blocked" threads will not run until the kernel wakes them up, so I don't think they'll cause any context switching overhead. That switching overhead only applies to apps where all the threads are trying to run at the same time to process some data. – Dobes Vandermeer Mar 24 '12 at 12:45
0

There is not a single "best" way to build NIO servers, but the preponderance of this particular question on SO suggests that people think there is! Your question summarizes the use cases that are suited to both options well enough to help you make the decision that is right for you.

Also, hybrid solutions are possible too! You could hand the channel off to threads when they are going to do something worthy of their expense, and stick to NIO when it is better.

pawstrong
  • 924
  • 7
  • 17
0

I would say start with thread-per-connection and adapt from there if you run into problems.

If you really need to handle a million connections you should consider writing (or finding) a simple request broker in C (or whatever) that will use far less memory per connection than any java implementation can. The broker can receive requests asynchronously and queue them to backend workers written in your language of choice.

The backends thus only need a thread per active request, and you can just have a fixed number of them so the memory and database use is predetermined to some degree. When large numbers of requests are running in parallel the requests are made to wait a bit longer.

Thus I think you should never have to resort to NIO select channels or asynchronous I/O (NIO 2) on 64-bit systems. The thread-per-connection model works well enough and you can do your scaling to "tens or hundreds of thousands" of connections using some more appropriate low-level technology.

It is always helpful to avoid premature optimization (i.e. writing NIO code before you really have massive numbers of connections coming in) and don't reinvent the wheel (Jetty, nginx, etc.) if possible.

Dobes Vandermeer
  • 8,463
  • 5
  • 43
  • 46
0

What most often is overlooked is that NIO allows zero copy handling. E.g. if you listen to the same multicast traffic from within multiple processes using old school sockets on one single server, any multicast packet is copied from the network/kernel buffer to each listening application. So if you build a GRID of e.g. 20 processes, you get memory bandwidth issues. With nio you can examine the incoming buffer without having to copy it to application space. The process then copies only parts of the incoming traffic it is interested in.

another application example: see http://www.ibm.com/developerworks/java/library/j-zerocopy/ for an example.

R.Moeller
  • 3,436
  • 1
  • 17
  • 12