24

I'm right now debugging a program which has two threads per one external process, and those two threads keep on reading Process.getErrorStream() and Process.getInputStream() using a while ((i = in.read(buf, 0, buf.length)) >= 0) loop.

Sometimes when the external process crashes due to a JVM crash (see these hs_err_pid.log files), those threads which read the stdout/stderr of that external process begin consuming 100% CPU and never exit. The loop body is not being executed (I've added a logging statement there), so the infinite loop appears to be inside the native method java.io.FileInputStream.readBytes.

I've reproduced this on both Windows 7 64-bit (jdk1.6.0_30 64-bit, jdk1.7.0_03 64-bit), and Linux 2.6.18 (jdk1.6.0_21 32-bit). The code in question is here and it is used like this. See those links for the full code - here are the interesting bits:

private final byte[]              buf = new byte[256];
private final InputStream         in;
...    

int i;
while ((i = this.in.read(this.buf, 0, this.buf.length)) >= 0) {
    ...
}

The stack traces look like

"PIT Stream Monitor" daemon prio=6 tid=0x0000000008869800 nid=0x1f70 runnable [0x000000000d7ff000]
   java.lang.Thread.State: RUNNABLE
    at java.io.FileInputStream.readBytes(Native Method)
    at java.io.FileInputStream.read(FileInputStream.java:220)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
    - locked <0x00000007c89d6d90> (a java.io.BufferedInputStream)
    at org.pitest.util.StreamMonitor.readFromStream(StreamMonitor.java:38)
    at org.pitest.util.StreamMonitor.process(StreamMonitor.java:32)
    at org.pitest.util.AbstractMonitor.run(AbstractMonitor.java:19)
   Locked ownable synchronizers:
    - None

or

"PIT Stream Monitor" daemon prio=6 tid=0x0000000008873000 nid=0x1cb8 runnable [0x000000000e3ff000]
   java.lang.Thread.State: RUNNABLE
    at java.io.FileInputStream.readBytes(Native Method)
    at java.io.FileInputStream.read(FileInputStream.java:220)
    at org.pitest.util.StreamMonitor.readFromStream(StreamMonitor.java:38)
    at org.pitest.util.StreamMonitor.process(StreamMonitor.java:32)
    at org.pitest.util.AbstractMonitor.run(AbstractMonitor.java:19)
   Locked ownable synchronizers:
    - None

With the Sysinternals Process Explorer I was able to get native stack traces of those threads. Most often, over 80% of the time, the stack trace looks like this:

ntdll.dll!NtReadFile+0xa
KERNELBASE.dll!ReadFile+0x7a
kernel32.dll!ReadFile+0x59
java.dll!handleRead+0x2c
java.dll!VerifyClassCodesForMajorVersion+0x1d1
java.dll!Java_java_io_FileInputStream_readBytes+0x1d

This also happens quite often:

ntdll.dll!RtlNtStatusToDosErrorNoTeb+0x52
ntdll.dll!RtlNtStatusToDosError+0x23
KERNELBASE.dll!GetCurrentThreadId+0x2c
KERNELBASE.dll!CreatePipe+0x21a
kernel32.dll!ReadFile+0x59
java.dll!handleRead+0x2c
java.dll!VerifyClassCodesForMajorVersion+0x1d1
java.dll!Java_java_io_FileInputStream_readBytes+0x1d

ntdll.dll!RtlNtStatusToDosErrorNoTeb+0x42
ntdll.dll!RtlNtStatusToDosError+0x23
KERNELBASE.dll!GetCurrentThreadId+0x2c
KERNELBASE.dll!CreatePipe+0x21a
kernel32.dll!ReadFile+0x59
java.dll!handleRead+0x2c
java.dll!VerifyClassCodesForMajorVersion+0x1d1
java.dll!Java_java_io_FileInputStream_readBytes+0x1d

And sometimes it's executing this part of the code:

java.dll!VerifyClassCodesForMajorVersion+0xc3
java.dll!Java_java_io_FileInputStream_readBytes+0x1d

java.dll!Java_sun_io_Win32ErrorMode_setErrorMode+0x847c
java.dll!VerifyClassCodesForMajorVersion+0xd7
java.dll!Java_java_io_FileInputStream_readBytes+0x1d

jvm.dll!JNI_GetCreatedJavaVMs+0x1829f
java.dll!VerifyClassCodesForMajorVersion+0x128
java.dll!Java_java_io_FileInputStream_readBytes+0x1d

jvm.dll+0x88c1
jvm.dll!JNI_GetCreatedJavaVMs+0x182a7
java.dll!VerifyClassCodesForMajorVersion+0x128
java.dll!Java_java_io_FileInputStream_readBytes+0x1d

java.dll!VerifyClassCodesForMajorVersion+0x10b
java.dll!Java_java_io_FileInputStream_readBytes+0x1d

jvm.dll!JNI_CreateJavaVM+0x1423
java.dll!VerifyClassCodesForMajorVersion+0x190
java.dll!Java_java_io_FileInputStream_readBytes+0x1d

jvm.dll+0x88bf
jvm.dll!JNI_CreateJavaVM+0x147d
java.dll!VerifyClassCodesForMajorVersion+0x190
java.dll!Java_java_io_FileInputStream_readBytes+0x1d

java.dll!VerifyClassCodesForMajorVersion+0x1aa
java.dll!Java_java_io_FileInputStream_readBytes+0x1d

java.dll!VerifyClassCodesForMajorVersion+0x1c3
java.dll!Java_java_io_FileInputStream_readBytes+0x1d

java.dll!VerifyClassCodesForMajorVersion+0x224
java.dll!Java_java_io_FileInputStream_readBytes+0x1d

Any ideas how to solve this problem? Is this a known problem with the JVM? Is there a workaround?

Esko Luontola
  • 73,184
  • 17
  • 117
  • 128
  • 3
    Could you include your loop code? BTW the condition `>= 0` is overly broad, if `buf.length` is non-zero, then read() is guaranteed to read at least 1 byte or return -1 (or throw an Exception). – Mark Rotteveel Apr 16 '12 at 16:48
  • What? Is `Process.getInputStream()` returning a FileInputStream? – Martijn Courteaux Apr 16 '12 at 17:28
  • Yes. In java.lang.ProcessImpl#ProcessImpl you can see stdout_stream and stderr_stream being initialized with a FileInputStream. It kinda makes sense from the Unix point of view, where everything is a file. – Esko Luontola Apr 16 '12 at 17:57
  • 3
    Looking at OpenJDK's source code, the implementation of readBytes has no loops and is a thin wrapper over Windows' ReadFile function. Let's continue looking into OS issues... maybe antivirus or something... – Esko Luontola Apr 16 '12 at 19:33
  • Is it reproducible on another machine (maybe even on another operating system)? – MRalwasser Apr 17 '12 at 07:44
  • I'm getting similar symptoms - the system slowing down very much - on our CI server which runs on Linux. I haven't yet gotten a stack trace from there, because I don't have shell access to those machines, but I'll try to reproduce it on some other machine. – Esko Luontola Apr 17 '12 at 08:03
  • 1
    This issue might be related: http://weblogs.java.net/blog/kohsuke/archive/2009/09/28/reading-stdin-may-cause-your-jvm-hang – Esko Luontola Apr 17 '12 at 08:03
  • Slowing down due to blocking (which might be related to non-draining of the output stream of the child process) or also 100% cpu usage on one thread? – MRalwasser Apr 17 '12 at 08:19
  • This problem happens also on Linux 2.6.18 and Java 1.6.0_21 32-bit. – Esko Luontola Apr 17 '12 at 09:52
  • I added sample hs_err_pid.log files which show what kind of a JVM crash happens in the child process whose stdout/stderr the problematic threads were reading. They are here, both for Windows and Linux: https://gist.github.com/2405004 – Esko Luontola Apr 17 '12 at 10:11
  • @EskoLuontola: on Linux, do you also get 100% cpu usage in this case or only blocked threads? – MRalwasser Apr 17 '12 at 10:27
  • Yes. Also on Linux those threads hog the CPU. – Esko Luontola Apr 17 '12 at 10:38
  • Can you attach gdb to get a (native) stack traces here aswell? – MRalwasser Apr 17 '12 at 10:46
  • Can anyone familiar with the eclipse codebase comment on if/how eclipse works around this issue? – henry Apr 17 '12 at 19:22

1 Answers1

1

I've not yet been able to reproduce this locally, but the two possible workarounds might be

  • Play around with in.available().

  • Redirect stout and stderr in the external process to a socket and read this from the controlling process instead.

Chandra Sekhar
  • 16,256
  • 10
  • 67
  • 90
henry
  • 5,923
  • 29
  • 46
  • [A quick fix using in.available()](http://code.google.com/p/pitestrunner/source/browse/pitest/src/main/java/org/pitest/util/StreamMonitor.java?spec=svn0607ac947dd76768f5e852350386bc9c324a6005&r=0607ac947dd76768f5e852350386bc9c324a6005#59) has helped to avoid the problem for now. We are still looking for a better solution and the reason why this is happening in the first place. I'll try to produce a [SSCCE](http://sscce.org/) by removing superfluous code from the closed source project where I can reproduce this. – Esko Luontola Apr 21 '12 at 14:31
  • The problem can be re-produced by crashing the child process with the infinite array allocating loop from http://stackoverflow.com/questions/65200/how-do-you-crash-a-jvm and probably the other techniques listed there. – henry Apr 22 '12 at 20:03