17

We have a client server app, 1 server, about 10 clients. They communicate via tcp sockets using custom queries.

The system had been running smooth for many months, but at some point, after the daily scheduled server FULL GC that took about 50s, we figured out that the time between the queries sent by the client and responses received from the server was large, > 10-20s. After some 3 hours the system recovered, everything was running fine again.

While investigating the issue, we found:

  1. No garbage collection problems on both clients and server
  2. Query processing time on server was small.
  3. Load on server was high.
  4. The network bandwidth was not saturated.
  5. Connections were not reset during the FULL GC (the daily FULL GC was a normal event until then)
  6. The machine and OS changed recently from Centos 6 (kernel 2.6.32) to Centos 7 (kernel 3.10.0), but the new config was extensivelly tested. Also Oracle JDK version changed from 1.7.65 to 1.7.75.

We took a thread dump on the server:

java.lang.Thread.State: RUNNABLE
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at util.network.BytesBasedSocketConnection$ReadConnectionRunnable.run(BytesBasedSocketConnection.java:293)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

The FilterInputStream.read() is the following:

    public int read() throws IOException {
    return in.read();
}

The in in our code is a BufferedInputStream.

The questions are: Why most connections slowed after the Full GC pause? Why the stacktrace ends in FilterInputStream.read()? Shouldn't it end somewhere in the BufferedInputStream or in socket input stream? Can this read lead to high load on server?

The code we use for reading:

int constructLength = _socketDIS.readInt();
ByteArrayOutputStream constructBOAS = new ByteArrayOutputStream(constructLength);
for (int i = 0; i != constructLength; i++)
      constructBOAS.write(_socketDIS.read());
constructBOAS.close();
byte[] bytes = constructBOAS.toByteArray();

where:

_socketDIS = new DataInputStream(new BufferedInputStream(_socket.getInputStream()));

Here is the stacktrace from the well working client connections:

java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
    - locked <0x00007f522cbebca8> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.readInt(DataInputStream.java:387)
    at util.network.BytesBasedSocketConnection$ReadConnectionRunnable.run(BytesBasedSocketConnection.java:287)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

UPDATE:

Regarding the EJP answer:

  1. There was no EOS involved, the connections were up, but they were very slow

  2. Even if there was an EOS I can't see how the code could spin at the EOS, the for is bounded by the constructLength value. But still, the suggested improvement is valid.

  3. The stacktrace with the problem ends in a read done on a DataInputStream ((_socketDIS.read()) that is inherited from FilterInputStream.read(), see code above. DataInputStream, not BufferedInputStream is missing the read(). Here in FilterInputStream.read() there is a in.read() called on a BufferedInputStream, this one has its own read() method defined. But the stacktrace stops in the middle, is not reaching BufferedInputStream.read(). Why?

dcernahoschi
  • 14,968
  • 5
  • 37
  • 59

4 Answers4

7

Reading one byte at a time is wasteful of CPU. Throw this away:

int constructLength = _socketDIS.readInt();
ByteArrayOutputStream constructBOAS = new ByteArrayOutputStream(constructLength);
for (int i = 0; i != constructLength; i++)
      constructBOAS.write(_socketDIS.read());
constructBOAS.close();
byte[] bytes = constructBOAS.toByteArray();

and use this:

int constructLength = _socketDIS.readInt();
byte[] bytes = new byte[constructLength];
_socketDIS.readFully(bytes);

NB _socketDIS is clearly not a BufferedInputStream but a DataInputStream, which is unbuffered.

EDIT

Why the stacktrace ends in FilterInputStream.read()?

Look carefully. BufferedInputStream doesn't implement all three read() overloads. One of them, I forget which, is implemented down in FilterInputStream, the base class, and the other two overloads call that.

Shouldn't it end somewhere in the BufferedInputStream

No, see above.

or in socket input stream?

Yes, if it was blocking, but it isn't, probably because you're spinning at end of stream, because of your poor code.

Can this read lead to high load on server?

Yes.

Am_I_Helpful
  • 18,735
  • 7
  • 49
  • 73
user207421
  • 305,947
  • 44
  • 307
  • 483
  • Ok, but most of the time this code works fine. And for sure `_socketDIS` is buffered. I'll shortly upload another stacktrace from connections that works fine and that uses the same code. – dcernahoschi May 20 '15 at 12:32
  • Whatever, but this is better code anyway. It takes care of the corner cases properly, for example, premature EOS, where yours just spin-loops. – user207421 May 20 '15 at 12:33
  • Thanks, I agree it's better, but is not the issue here, the connection stays all the time alive and at some point recovers, the speed returns to normal. – dcernahoschi May 20 '15 at 12:49
  • 1
    Of course it's the issue. Your own stack trace proves it. You're busy in `FilterInputStream.read().` You are copying a byte at a time when you could be doing it all in a single read. – user207421 May 20 '15 at 23:49
  • It's not. Added more explanations in the question. Described better under what conditions appeared the problem. – dcernahoschi May 21 '15 at 07:53
  • It's not if and only if the issue keeps happening with this code. Does it? – user207421 May 22 '15 at 17:41
  • It happened only once in an year for 3 hours, in the rest the transfer time is a few millis with little server load, even in peaks. And this answer is not holding for the other reasons stated in the question. Although the simptoms looked as if the buffering was not there. – dcernahoschi May 22 '15 at 18:38
  • End of stream can occur any time. It isn't 'bounded by `constructLength`', except in a perfect world. This one isn't perfect. – user207421 Jan 10 '18 at 21:05
3

The stack trace shows that you are using a ScheduledThreadPoolExecutor. I would suggest you study the schedule. It is likely that the delays are merely because the reading is on some kind of schedule - which seems daft to me.

OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
  • Ah, indeed there is a `ScheduledThreadPoolExecutor` wrongly used, but the task is a continuously running one, it just reads queries from socket and puts them to a queue. – dcernahoschi May 20 '15 at 11:15
1

This is more of an extended comment, but it is too long for a comment so I'll provide it in an answer.

As you note, the thread dump showing a thread in the middle of FilterInputStream.read() is unusual. While it could happen by chance, while the overridden FilterInputStream.read() is being resolved to BufferedInputStream.read(), happening by chance seems pretty unlikely.

Given this happened after a full garbage collection, it seems to me more likely that it was taking longer to resolve the FilterInputStream call because the BufferedInputStream class got moved or unloaded during the full garbage collection. For example, if by chance there were no BufferedInputStream objects in use when the full garbage collection occurred, the BufferedInputStream class could have been unloaded, requiring class loading when its read() method was needed. This could account for the delay you saw, at least one time.

It is possible in some situations for classes to get unloaded when the last of their instances is garbage collected, even in the absence of a full garbage collection. If the BufferedInputStream class was somehow getting unloaded after each use and reloaded the next time it was used, that could explain the symptoms you say, but I wouldn't normally expect that to happen.

It's also possible that there was thrashing of the memory page that held the BufferedInputStream class, perhaps because the class was relocated during the full garbage collection, if you were using virtual memory. If you have any records regarding memory usage during this time, they might be worth a look.

Warren Dew
  • 8,790
  • 3
  • 30
  • 44
  • Thanks. `BufferedInputStream` instances should not have been garbage collected as the socket connections stayed alive. No, the memory remained low, no trashing either. – dcernahoschi May 23 '15 at 11:27
-1

I guess you must try to flush it in order to read and write in the stream without errors or slow down connections.

Mrfrog
  • 75
  • 1
  • 10