97

I'm moving a project to the new Android Native Development Kit (i.e. JNI) and I'd like to catch SIGSEGV, should it occur (possibly also SIGILL, SIGABRT, SIGFPE) in order to present a nice crash reporting dialog, instead of (or before) what currently happens: the immediate unceremonious death of the process and possibly some attempt by the OS to restart it. (Edit: The JVM/Dalvik VM catches the signal and logs a stack trace and other useful information; I just want to offer the user the option to email that info to me really.)

The situation is: a large body of C code which I didn't write does most of the work in this application (all the game logic) and although it's well-tested on numerous other platforms, it's entirely possible that I, in my Android port, will feed it garbage and cause a crash in native code, so I want the crash dumps (both native and Java) that currently show up in the Android log (I guess it would be stderr in a non-Android situation). I'm free to modify both C and Java code arbitrarily, although the callbacks (both going in and coming out of JNI) number about 40 and obviously, bonus points for small diffs.

I've heard of the signal chaining library in J2SE, libjsig.so, and if I could safely install a signal handler like that on Android, that would solve the catching part of my question, but I see no such library for Android/Dalvik.

Chris Boyle
  • 11,423
  • 7
  • 48
  • 63
  • If you can start the Java VM through a wrapper script, you can check if the app exited abnormally, and do the error reporting. That would allow you to cleanly catch all kinds of abnormal exits, be they SIGSEGV, SIGKILL or whatever. However, I don't think this is possible with stock Android apps, so posting this as a comment (converted from answer). – sleske Jul 07 '15 at 06:59
  • Also see: [Can't run a Java Android program with Valgrind](http://stackoverflow.com/questions/13531496/cant-run-a-java-android-program-with-valgrind/19235439#19235439) for how to start an Android app with a wrapper script (in adb shell). – sleske Jul 07 '15 at 07:00
  • 1
    The answer needs to be updated. The source code provided in the accepted answer will result to undefined behavior due to call to non async-signal-safe functions. Please see here: https://stackoverflow.com/questions/34547199/art-prevents-any-java-calls-from-jni-during-native-signal-handling/34553070#34553070 – user1506104 Aug 15 '18 at 04:20

4 Answers4

84

Edit: From Jelly Bean onwards you can't get the stack trace, because READ_LOGS went away. :-(

I actually got a signal handler working without doing anything too exotic, and have released code using it, which you can see on github (edit: linking to historical release; I removed the crash handler since then). Here's how:

  1. Use sigaction() to catch the signals and store the old handlers. (android.c:570)
  2. Time passes, a segfault happens.
  3. In the signal handler, call up to JNI one last time and then call the old handler. (android.c:528)
  4. In that JNI call, log any useful debugging info, and call startActivity() on an activity that is flagged as needing to be in its own process. (SGTPuzzles.java:962, AndroidManifest.xml:28)
  5. When you come back from Java and call that old handler, the Android framework will connect to debuggerd to log a nice native trace for you, and then the process will die. (debugger.c, debuggerd.c)
  6. Meanwhile, your crash-handling activity is starting up. Really you should pass it the PID so it can wait for step 5 to complete; I don't do this. Here you apologise to the user and ask if you can send a log. If so, gather the output of logcat -d -v threadtime and launch an ACTION_SEND with recipient, subject and body filled in. The user will have to press Send. (CrashHandler.java, SGTPuzzles.java:462, strings.xml:41
  7. Watch out for logcat failing or taking more than a few seconds. I have encountered one device, the T-Mobile Pulse / Huawei U8220, where logcat immediately goes into the T (traced) state and hangs. (CrashHandler.java:70, strings.xml:51)

In a non-Android situation, some of this would be different. You'd need to gather your own native trace, see this other question, depending on what sort of libc you have. You'd need to handle dumping that trace, launching your separate crash-handler process, and sending the email in some appropriate ways for your platform, but I imagine the general approach should still work.

Community
  • 1
  • 1
Chris Boyle
  • 11,423
  • 7
  • 48
  • 63
  • 2
    Ideally you'd check to see if the crash occurred in your library. If it occurred somewhere else (say, inside the VM), your JNI calls from the signal handler could confuse things rather badly. It's not the end of the world, since you're mid-crash anyway, but it might make diagnosis of a VM crash more difficult (or cause a bizarre VM crash that ends up in an Android bug report and baffles everyone). – fadden Aug 05 '10 at 19:38
  • You are wonderful @Chris for sharing your research project on this! – olafure Jun 14 '11 at 16:37
  • Thanks, this was useful in finding where my JNI was going nuts. Also, hello from a DCS alumnus! – Nick Jul 18 '11 at 18:11
  • @fadden True, but I'm not sure how to find that out. Since it's before debuggerd has logged anything, I'd need to either do some stack-unwinding myself (looks really non-trivial on Android), or frequently set an in/out flag or detach/attach the handler, in which case I'm sure I'd miss a spot somewhere. – Chris Boyle Oct 01 '11 at 08:01
  • 3
    Starting an Activity in a new process from a Service also requires the following code: `newIntent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK);` – Graeme Sep 27 '12 at 10:32
  • yeah activity new task is important if you don't use it new activity don't start – Gelldur Feb 17 '13 at 16:49
  • 1
    Is this solution still valid under Jelly Bean? Won't step 6 fail to log anything `debuggerd` outputs? – Josh Apr 19 '13 at 15:45
  • I have created [a small library](https://github.com/SalomonBrys/Native-Crash-ACRA-Handler) that uses exactly these steps to enable ACRA crash reporting for JNI code. Your post was very helpful, thanks a lot! – Salomon BRYS Jul 23 '13 at 11:11
  • I tried on android-10, if in sig_handler method call jobject, will report error 'JNI DETECTED ERROR IN APPLICATION: use of invalid jobject 0x7fec96d444' – fujian26 Sep 14 '20 at 09:24
16

I'm a little bit late, but I had the exact same need, and I've developed a small library to address it, by catching common crashes (SEGV, SIBGUS, etc.) inside JNI code, and replace them by regular java.lang.Error exceptions. Bonus, if the client is running on Android >= 4.1.1, the stack trace embeds the resolved backtrace of the crash (a pseudo-trace containing the full native stack trace). You will not recover from vicious crashes (ie. if you corrupt the allocator, for example), but at least it should allows you to recover from most of them. (please report successes and failures, the code is brand new)

More info at https://github.com/xroche/coffeecatch (code is BSD 2-Clauses license)

xroche
  • 259
  • 2
  • 7
6

FWIW, Google Breakpad works fine on Android. I did the porting work, and we're shipping it as part of Firefox Mobile. It requires a little setup, since it doesn't give you stack traces on the client-side, but sends you the raw stack memory and does the stack walking server-side (so you don't have to ship debug symbols with your app).

Ted Mielczarek
  • 3,919
  • 26
  • 32
  • 1
    It's almost impossible to configure Breakpad considering absolutely missing documentation – shader Sep 17 '12 at 08:36
  • It's really not that hard, and there's plenty of documentation on the project wiki. In fact, for Android there's now a NDK build Makefile and it should be super easy to use: http://code.google.com/p/google-breakpad/source/browse/trunk/README.ANDROID – Ted Mielczarek Sep 17 '12 at 13:27
  • You also need to compile module that preprocesses debug symbol files for Android and you can only compile that on Linux. When you compile on a Mac - it only builds Mac/iOS dSym preprocessor. – shader Sep 20 '12 at 08:35
5

In my limited experience (non-Android), SIGSEGV in JNI code will generally crash the JVM before control is returned to your Java code. I vaguely recall hearing about some non-Sun JVM which lets you catch SIGSEGV, but AFAICR you can't expect to be able to do so.

You can try to catch them in C (see sigaction(2)), although you can do very little after a SIGSEGV (or SIGFPE or SIGILL) handler as the ongoing behaviour of a process is officially undefined.

mas90
  • 51
  • 2
  • Well, behaviour is undefined after "ignor[ing] a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by kill(2) or raise(3)", but not necessarily during catching such a signal. Current plan is to try a C signal handler that calls back to Java and, somehow, terminates the thread without terminating the process. This may or may not be possible. :-) – Chris Boyle Jul 06 '09 at 13:03
  • 1
    C backtrace instructions: http://stackoverflow.com/questions/76822/how-to-generate-a-stacktrace-when-my-c-app-crashes-using-gcc-compiler/77281#77281 – Chris Boyle Jul 06 '09 at 13:21
  • 1
    ...except I can't use backtrace(), because Android doesn't use glibc, it uses Bionic. :-( Something involving `_Unwind_Backtrace` from `unwind.h` will be needed instead. – Chris Boyle Jul 06 '09 at 13:25