40

Is the non-blocking Java NIO still slower than your standard thread per connection asynchronous socket?

In addition, if you were to use threads per connection, would you just create new threads or would you use a very large thread pool?

I'm writing an MMORPG server in Java that should be able to scale 10000 clients easily given powerful enough hardware, although the maximum amount of clients is 24000 (which I believe is impossible to reach for the thread per connection model because of a 15000 thread limit in Java). From a three year old article, I've heard that blocking IO with a thread per connection model was still 25% faster than NIO (namely, this document http://www.mailinator.com/tymaPaulMultithreaded.pdf), but can the same still be achieved on this day? Java has changed a lot since then, and I've heard that the results were questionable when comparing real life scenarios because the VM used was not Sun Java. Also, because it is an MMORPG server with many concurrent users interacting with each other, will the use of synchronization and thread safety practices decrease performance to the point where a single threaded NIO selector serving 10000 clients will be faster? (all the work doesn't necessary have to be processed on the thread with the selector, it can be processed on worker threads like how MINA/Netty works).

Thanks!

Jonas
  • 121,568
  • 97
  • 310
  • 388
Kevin Jin
  • 1,536
  • 4
  • 18
  • 20
  • 9
    10k threads is not a win for any (*commodity*) server :-) Also, 10k active clients on a single box is very ... unlikely. –  Jan 20 '11 at 20:51
  • 1
    @pst: if by commodity you mean; non quantum, yet to be discovered kind of technology, I totally agree. I think the least of Kevin's problems is the thread count. I do apologize for not having any useful input on the matter. Also remember the QOTD: Test. – Captain Giraffe Jan 20 '11 at 21:04
  • @pst Oh sweet JRE its elastic! You just made my day worthwhile. – Captain Giraffe Jan 20 '11 at 21:19
  • @Captain Giraffe I'm lost :-/ –  Jan 21 '11 at 03:16

7 Answers7

23

NIO benefits should be taken with a grain of salt.

In a HTTP server, most connections are keep-alive connections, they are idle most of times. It would be a waste of resource to pre-allocate a thread for each.

For MMORPG things are very different. I guess connections are constantly busy receiving instructions from users and sending latest system state to users. A thread is needed most of time for a connection.

If you use NIO, you'll have to constantly re-allocate a thread for a connection. It may be a inferior solution, to the simple fixed-thread-per-connection solution.

The default thread stack size is pretty large, (1/4 MB?) it's the major reason why there can only be limited threads. Try reduce it and see if your system can support more.

However if your game is indeed very "busy", it's your CPU that you need to worry the most. NIO or not, it's really hard to handle thousands of hyper active gamers on a machine.

irreputable
  • 44,725
  • 9
  • 65
  • 93
  • 1
    `For MMORPG things are very different. I guess connections are constantly busy receiving instructions from users and sending latest system state to users.` I'd suppose the users go to bed, sometimes, at least most of them. So I'd expect thousands of busy connections and even more idle ones. I wonder how to separate them, maybe using threads and let them die on Socket timeout? – maaartinus Jan 20 '11 at 22:18
  • I did a single thread client, single thread server in old io and one in nio here (the nio beat the old io or tied it...hard to say...neck and neck.) https://github.com/deanhiller/webpieces/tree/master/core/core-asyncserver/src/test/java/org/webpieces/nio/api/throughput Nio seems like a better option to then have an N sized threadpool to 10k socket ratio. – Dean Hiller Jun 14 '17 at 03:20
12

There are actually 3 solutions:

  1. Multiple threads
  2. One thread and NIO
  3. Both solutions 1 and 2 at the same time

The best thing to do for performance is to have a small, limited number of threads and multiplex network events onto these threads with NIO as new messages come in over the network.


Using NIO with one thread is a bad idea for a few reasons:

  • If you have multiple CPUs or cores, you will be idling resources since you can only use one core at a time if you only have one thread.
  • If you have to block for some reason (maybe to do a disk access), you CPU is idle when you could be handling another connection while you're waiting for the disk.

One thread per connection is a bad idea because it doesn't scale. Let's say have:

  • 10 000 connections
  • 2 CPUs with 2 cores each
  • only 100 threads will be block at any given time

Then you can work out that you only need 104 threads. Any more and you're wasting resources managing extra threads that you don't need. There is a lot of bookkeeping under the hood needed to manage 10 000 threads. This will slow you down.


This is why you combine the two solutions. Also, make sure your VM is using the fastest system calls. Every OS has its own unique system calls for high performance network IO. Make sure your VM is using the latest and greatest. I believe this is epoll() in Linux.

In addition, if you were to use threads per connection, would you just create new threads or would you use a very large thread pool?

It depends how much time you want to spend optimizing. The quickest solution is to create resources like threads and strings when needed. Then let the garbage collection claim them when you're done with them. You can get a performance boost by having a pool of resources. Instead of creating a new object, you ask the pool for one, and return it to the pool when you're done. This adds the complexity of concurrency control. This can be further optimized with advance concurrency algorithms like non-blocking algorithms. New versions of the Java API have a few of these for you. You can spend the rest of your life doing these optimizations on just one program. What is the best solution for your specific application is probably a question that deserves its own post.

Jay
  • 9,314
  • 7
  • 33
  • 40
  • How did you decide on 104 threads? I see 10000/100 + 4. But why not just have 100 threads, what are the other 4 for? 4 reading threads and 100 workers? Would there be multiple threads that call selector.select()? I'm currently exploring the idea of having multiple threads with individuals selectors. I've done it in C, but the NIO syntax is a little different to me. register keys to selectors in a round robin fashion is what I'm thinking. Each thread calls select and then spawns a thread to do work, or using some thread pooling black box magic. – JustinDanielson Feb 05 '13 at 21:31
  • @JustinDanielson, If there are 100 threads blocked, you can have an additional 4 threads on the CPU since there are 4 cores. Any less and there will be idle cores when there is work to do. – Jay Feb 06 '13 at 17:33
9

If you willing to spend any amount of money on powerful enough hardware why limit yourself to one server. google don't use one server, they don't even use one datacenter of servers.

A common misconception is that NIO allows non-blocking IO therefor its the only model worth benchmarking. If you benchmark blocking NIO you can get it 30% faster than old IO. i.e. if you use the same threading model and compare just the IO models.

For a sophisticated game, you are far more likely to run out of CPU before you hit 10K connections. Again it is simpler to have a solution which scales horizontally. Then you don't need to worry about how many connections you can get.

How many users can reasonably interact? 24? in which case you have 1000 independent groups interacting. You won't have this many cores in one server.

How much money per users are you intending to spend on server(s)? You can buy an 12 core server with 64 GB of memory for less than £5000. If you place 2500 users on this server you have spent £2 per user.

EDIT: I have a reference http://vanillajava.blogspot.com/2010/07/java-nio-is-faster-than-java-io-for.html which is mine. ;) I had this reviewed by someone who is a GURU of Java Networking and it broadly agreed with what he had found.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Wow, I didn't know that NIO supports blocking IO as well! It seems as though it doesn't draw much attention to itself and as a result, it is hard to find benchmarks comparing blocking old IO and blocking NIO, which is a shame. I'll take this into consideration though. – Kevin Jin Jan 21 '11 at 00:13
  • 1
    I doubt your performance number, do you have a reference? There were report that non-blocking NIO is 30% slower than traditional IO. However the test was not realistic because it doesn't do anything with data. As soon as each byte in the stream is at least read once, the overhead of NIO/IO becomes insignificant. – irreputable Jan 21 '11 at 00:36
  • When you do something realistic with the data the network and the processing become more important and the advantage of NIO or IO is largely lost. I use NIO as it appears to same me about 6 us latency for reads. That's not particularly important to most developers. – Peter Lawrey Jan 21 '11 at 07:06
  • here is a benchmark test showing nio beating old io on some machines or tying it. https://github.com/deanhiller/webpieces/tree/master/core/core-asyncserver/src/test/java/org/webpieces/nio/api/throughput – Dean Hiller Jun 14 '17 at 03:21
  • @DeanHiller what where the results, and did you look at blocking NIO as mentioned in the answer? – Peter Lawrey Jun 18 '17 at 08:19
  • @PeterLawrey ah, no, I have not tried blocking NIO as of yet. I should try that as well! good call. On my personal laptop single threaded svr(ie. selector thread) and single threaded client(selector thread reading) and one client thread to always write, it was at 100,000 rps on my laptop(http1.1). It was at 24Gigabytes per second on raw nio(I think it could go higher with changes). localhost I think maxes out at 40GB. I can't test on network at that throughput level as my network is only 1GB. This project has backpressure as well preventing client writes under load. – Dean Hiller Jun 27 '17 at 13:39
  • @PeterLawrey The interesting thing is the automatic backpressure increased performance (which I had not expected). There is an http1.1 and http2 rps test here now as well (single socket only though :( )... https://github.com/deanhiller/webpieces/tree/master/http/http-backpressure-tests/src/test/java/org/webpieces/throughput). I still need to try a huge amount of sockets and throughput test that though TBH. I also need to wire backpressure through the SSL layer as well(it is the only layer missing passing backpressure through to the nic to prevent clients from writing). – Dean Hiller Jun 27 '17 at 13:42
  • @DeanHiller when a system is overloaded it's throughput can drop so avoiding overloaded states can improve throughput. – Peter Lawrey Jun 28 '17 at 14:49
3

If you have busy connections, which means they constantly send you data and you send them back, you may use non-Blocking IO in conjunction with Akka.

Akka is an open-source toolkit and runtime simplifying the construction of concurrent and distributed applications on the JVM. Akka supports multiple programming models for concurrency, but it emphasizes actor-based concurrency, with inspiration drawn from Erlang. Language bindings exist for both Java and Scala.

Akka's logic is non-blocking so its perfect for asynchronous programming. Using Akka Actors you may remove Thread overhead.

But if your socket streams block more often, I suggest using Blocking IO in conjunction with Quasar

Quasar is an open-source library for simple, lightweight JVM concurrency, which implements true lightweight threads (AKA fibers) on the JVM. Quasar fibers behave just like plain Java threads, except they have virtually no memory and task-switching overhead, so that you can easily spawn hundreds of thousands of fibers – or even millions – in a single JVM. Quasar also provides channels for inter-fiber communications modeled after those offered by the Go language, complete with channel selectors. It also contains a full implementation of the actor model, closely modeled after Erlang.

Quasar's logic is blocking, so you may spawn, say 24000 fibers waiting on different connections. One of positive points about Quasar is, fibers can interact with plain Threads very easily. Also Quasar has integrations with popular libraries, such as Apache HTTP client or JDBC or Jersey and so on, so you may use benefits of using Fibers in many aspects of your project.
You may see a good comparison between these two frameworks here.

Alireza Mohamadi
  • 751
  • 1
  • 6
  • 22
2

As most of you guys are saying that the server is bound to be locked up in CPU usage before 10k concurrent users are reached, I suppose it is better for me to use a threaded blocking (N)IO approach considering the fact that for this particular MMORPG, getting several packets per second for each player is not uncommon and might bog down a selector if one were to be used.

Peter raised an interesting point that blocking NIO is faster than the old libraries while irreputable mentioned that for a busy MMORPG server, it would be better to use threads because of how many instructions are received per player. I wouldn't count on too many players going idle on this game, so it shouldn't be a problem for me to have a bunch of non-running threads. I've come to realize that synchronization is still required even when using a framework based on NIO because they use several worker threads running at the same time to process packets received from clients. Context switching may prove to be expensive, but I'll give this solution a try. It's relatively easy to refactor my code so that I could use a NIO framework if I find there is a bottleneck.

I believe my question has been answered. I'll just wait a little bit more in order to receive even more insight from more people. Thank you for all your answers!

EDIT: I've finally chosen my course of action. I actually was indecisive and decided to use JBoss Netty and allow the user to switch between either oio or nio using the classes

org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory;
org.jboss.netty.channel.socket.oio.OioServerSocketChannelFactory;

Quite nice that Netty supports both!

Kevin Jin
  • 1,536
  • 4
  • 18
  • 20
1

You might get some inspiration from the former Sun sponsored project, now named Red Dwarf. The old website at http://www.reddwarfserver.org/ is down.
Github to the rescue: https://github.com/reddwarf-nextgen/reddwarf

Jochen Bedersdorfer
  • 4,093
  • 24
  • 26
0

If you do client side network calls, most likely you just need plain socket io.

If you are creating server side technologies, then NIO would help you in separating the network io part from fulfillment/processing work. IO threads configured as 1 or 2 for network IO. Worker threads are for actual processing part(which ranges from 1 to N, based on machine capabilities).

Venkateswara Rao
  • 5,242
  • 1
  • 18
  • 13