1

Good evening,

I'm developing a java tcp server for communication between clients.

At this point i'm load testing the developed server.

This morning i got my hands on a profiler (yourkit) and started looking for problem spots in my server.

I now have 480 clients sending messages to the server every 500 msec. The server forwards every received message to 6 clients.

The server is now using about 8% of my cpu, when being on constant load.

My question is about the java functions that uses the most cpu cycles.

The java function that uses the most cpu cycles is strangly "Thread.sleep", followed by "BufferedReader.readLine".

Both of these functions seem to block the current thread while waiting for something (sleep waits for a few msec, readline waits for data).

Can somebody explain why these 2 functions take up that much cpu cycles? I was also wondering if there are alternative approaches that use less cpu cycles.

Kind regards, T. Akhayo

T. Akhayo
  • 411
  • 6
  • 13

3 Answers3

2

sleep() and readLine() can use a lot of cpu as they both result in system calls which can context switch. It is also possible that the timing for these methods is not accurate for this reason (it may be an over estimate)

A way to reduce the overhead of context switches/sleep() is to have less threads and avoid needing to use sleep (e.g. use ScheduledExecutorServices), readLine() overhead can be reduced by using NIO but it is likely to add some complexity.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Thank you for the ScheduledExecutorServices tip. I didn't yet had the chance to try it, but it looks good. Although i'm a little worried about the overhead when i schedule many tasks in a second, but testing is knowing :) – T. Akhayo Jun 30 '11 at 08:43
  • The overhead of using too many threads can cost more. The overhead is 1 to 10 us. If you are calling sleep more than every 1 ms there is problem. – Peter Lawrey Jun 30 '11 at 08:54
1

Sleeping shouldn't be an issue, unless you're having a bunch of threads sleep for short periods of time (100-150ms is 'short' in when you have 480 threads running a loop that just sleeps and does something trivial).

The readLine call should be using next to nothing when it's not actually reading something, except when you first call it. But like you said, it blocks, and it shouldn't be using a noticeable amount of CPU unless you have small windows where it blocks. CPU usage isn't that much unless you're reading tons of data, or initially calling the method.

So, your loops are too tight, and you're receiving too many messages too quickly, which is ultimately causing 'tons' of context switching, and processing. I'd suggest using a NIO framework (like Netty) if you're not comfortable enough with NIO to use it on your own.

Also, 8% CPU isn't that much for 480 clients that send 2 messages per second.

Joe0
  • 153
  • 8
  • I do have about 480 threads that sleep for 100msc, they make sure that clients only receive 1 data packet every 100 msec. I will take a look to see if i can combine the work of these threads in about 80 threads, that will hopefully solve the sleep problem. The readline is indeed reading 2 messages a second, which i thought wasn't a problem. I used netty before, but it seems a bit overkill at this stage, lets wait until it is needed. As you said 8% isn't that much, but i suspect that it will increase very fast when more clients are introduced. – T. Akhayo Jun 30 '11 at 07:54
0

Here is a program in which sleep uses almost 100% of the cpu cycles given to the application:

for (i = 0; i < bigNumber; i++){
  sleep(someTime);
}

Why? Because it doesn't use very many actual cpu cycles at all, and of the ones it does use, nearly all of them are spent entering and leaving sleep.

Does that mean it's a real problem? Of course not.

That's the problem with profilers that only look at CPU time.

You need a sampler that samples on wall-clock time, not CPU time. It should sample the stack, not just the program counter. It should show you by line of code (not by function) the fraction of stack samples containing that line.

The usual objection to sampling on wall-clock time is that the measurements will be inaccurate due to sharing the machine with other processes. But that doesn't matter, because to find time drains does not require precision of measurement. It requires precision of location. What you are looking for is precise code locations, and call sites, that are on the stack a healthy fraction of actual time, as determined by stack sampling that's uncorrelated with the state of the program. Competition with other processes does not change the fraction of time that call sites are on the stack by a large enough amount to result in missing the problems.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135