4

My application is experiencing cashes in production. The crash dump indicates a SIGSEGV has occurred in GCTaskThread

It uses JNI, so there might be some source for memory corruption, although I can't be sure.

How can I debug this problem - I though of doing -XX:OnError... but i am not sure what will help me debug this.

Also, can some of you give a concrete example on how JNI code can crash GC with SIGSEGV

EDIT:

OS:SUSE Linux Enterprise Server 10 (x86_64)

vm_info: Java HotSpot(TM) 64-Bit Server VM (11.0-b15) for linux-amd64 JRE (1.6.0_10-b33), built on Sep 26 2008 01:10:29 by "java_re" with gcc 3.2.2 (SuSE Linux)

EDIT: The issue stop occurring after we disable the hyper threading, any thoughts?

ekeren
  • 3,408
  • 3
  • 35
  • 55

3 Answers3

2

Errors in JNI code can occur in several ways:

The program crashes during execution of a native method (most common).
The program crashes some time after returning from the native method, often during GC (not so common).
Bad JNI code causes deadlocks shortly after returning from a native method (occasional).

If you think that you have a problem with the interaction between user-written native code and the JVM (that is, a JNI problem), you can run diagnostics that help you check the JNI transitions. to invoke these diagnostics; specify the -Xcheck:jni option when you start up the JVM.

The -Xcheck:jni option activates a set of wrapper functions around the JNI functions. The wrapper functions perform checks on the incoming parameters. These checks include:

Whether the call and the call that initialized JNI are on the same thread.
Whether the object parameters are valid objects.
Whether local or global references refer to valid objects.
Whether the type of a field matches the Get<Type>Field or Set<Type>Field call.
Whether static and nonstatic field IDs are valid.
Whether strings are valid and non-null.
Whether array elements are non-null.
The types on array elements.

Pls read the following links http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/index.jsp?topic=/com.ibm.java.doc.diagnostics.50/html/jni_debug.html http://www.oracle.com/technetwork/java/javase/clopts-139448.html#gbmtq

Nike
  • 312
  • 1
  • 2
  • 9
  • Thanks, the problem is that this happens rarely in production environment,and this flag is not suitable for this. when I use it in my test environment I don't get any errors, still thanks and +1 – ekeren Jan 11 '11 at 22:16
  • could you please tell me the OS , java version and vendor ? – Nike Jan 12 '11 at 01:15
  • added to original post, Thanks – ekeren Jan 12 '11 at 07:50
  • JNI check did not help,I'll start a bounty – ekeren Feb 16 '11 at 09:36
  • If you get a stack dump, then you can usually find the JNI code which may be at fault. Here is a post on how to do that: http://wig-wag.com/devblog/?p=51 But if its happening from the JVM this won't work... – EdH Oct 15 '11 at 21:12
1

Use valgrind. This sounds like a memory corruption. The output will be verbose but try to isolate the report to the JNI library if its possible.

abdollar
  • 3,365
  • 1
  • 18
  • 23
  • 1
    Thanks for your answer. I can't reproduce it in my development environment, I can't afford running valgrind on the production server – ekeren Feb 23 '11 at 08:46
  • 1
    valgrind will catch memory overwrites that could be benign(as in your not seeing issues with the overwrites) in a development environment but cause problems in production - run it in development - thats what its for. – abdollar Feb 23 '11 at 18:58
0

Since the faulty thread seems to be GCTaskThread, did you try enabling verbose:gc and analyzing the output (preferably using a graphical tool like samurai, etc.)? Are you able to isolate a specific lib after examining the hs_err file?

Also, can you please provide more information on what causes the issue and if it is easily reproducible?

Musannif Zahir
  • 3,001
  • 1
  • 21
  • 31
  • not reproducible, it stopped happening when I disabled hyper threading on machine. what do you think I'll see with verbose:gc output. – ekeren Feb 23 '11 at 08:44
  • You might be able to see what phase or scenario triggers the specific issue and then narrow down the native libraries that are specific to those tasks. – Musannif Zahir Feb 23 '11 at 16:04