13

There are plenty of questions with ANR traces file included and the answer is always "oh, problem is in your thread 76, fix your http call" or something :) But I couldnt find any general guide or tutorial about how to read this traces, step-by-step for any ANR. Is there any? I have few questions in particular:

  1. Is is always possible to see the problem from thread traces I see for real-world ANRs in google console? Or is it possible that there is just no relevant info and I am in bad luck if I cant reproduce the ANR locally?

  2. What threads are included in this information? I suppose there are all threads from my app process, but what about the rest? Are they all in some way relevant for me? (for example threads that some of my threads are waiting for etc.) Or there are also completly unrelated processes?

  3. How google play console determine the "place" where ANR happened - which is then displayed in list of ANRs, for example :

ANR keyDispatchingTimedOut

miesto: com.sample.myapp/myapp.activities.SplashActivity

Because SplashActivity is nowhere to be seen in supplied text of the threads traces.

  1. I know that I should look for threads in WAIT state for potencial deadlocks etc. How about the situation where the thread is "waiting on himself"?

"AsyncTask #1" prio=5 tid=15 WAIT | group="main" sCount=1 dsCount=0 obj=0x41bb50c0 self=0x5529a868 | sysTid=2448 nice=0 sched=0/0 cgrp=apps handle=1429609576 | state=S schedstat=( 18097077 39273309 41 ) utm=1 stm=0 core=1 at java.lang.Object.wait(Native Method) - waiting on <0x41bb5258> (a java.lang.VMThread) held by tid=15 (AsyncTask #1)

Is this always OK and I can assume this is not the cause? What about the situation, where I have only bunch of threads in NATIVE (including main thread) and bunch of threads in WAIT waiting on themselves like this? How can this be ANR?

rouen
  • 5,003
  • 2
  • 25
  • 48
  • 1
    ANR doesn't usually mean it isn't responding. It usually means there was a crash in native code. As for debugging it- you need to find the stack trace or core dump in the logfile, and work back from there. The dump of every thread int he activity is almost always a red herring and not worth looking at, I don't think I've ever seen one that helped. – Gabe Sechan Mar 09 '15 at 09:14
  • Gabe, are you sure about this? As far as i know, native crashed are displayed in the other tab in google console and have "stack trace" (not very helpful one usually, but still). Can you point me to any source of your claim that ANRs can be "hidden" native crashes? – rouen Mar 09 '15 at 09:42
  • Just experience. I've solved hundreds of ANRs. I don't think more than 1 or 2 of them were deadlocks. – Gabe Sechan Mar 09 '15 at 09:43
  • Use [Crashlytics](https://try.crashlytics.com/) ([Fabric](https://get.fabric.io/)) – Ivan Aksamentov - Drop Mar 11 '15 at 11:25
  • 3
    From my experience, ANRs are from putting heavy load on the main UI thread, not from native code crashes. Even Google says so [here](http://developer.android.com/training/articles/perf-anr.html). Considering your trouble class is called "SplashActivity" and "Splash" screens are usually used to load resources, I would suggest looking into moving some of the tasks in that Activity to another background thread. And if it continues to persist, try to time how long it takes to complete the activity, especially if you have any loops that might take some time to traverse. – VERT9x Mar 12 '15 at 21:11
  • Always check the main thread stack trace, any other thread waiting will not cause ANR. You need other thread info to know which thread has the locks for which main thread is waiting. – nandeesh Mar 13 '15 at 04:02
  • I've looked at a large number of ANRs, usually when the cause was non-obvious, and few of them were caused by native crashes. This is partly because native crashes are obvious and don't result in people asking for help, but also because native crashes usually kill the entire app, not just one thread. (I never did properly sort out the "usually" aspect.) ANR literally means the application is not responding to IPC requests sent from the system to the app's UI thread. To solve the ANR, you need to figure out why the app's UI thread was temporarily or permanently unresponsive. – fadden Mar 17 '15 at 00:09

2 Answers2

5

The system sends various events to your application, which are received on the UI thread. If that thread doesn't respond to the events within a certain period of time, the system concludes that the app is unresponsive, and initiates the ANR handling.

Addressing your question point by point:

  1. It's not always possible to see the problem in the stack trace. The system server process detects that a problem exists, then signals the problematic process to dump its stack traces. If the app recovered between the problem discovery and stack dump signal, then the traces won't tell you much.

  2. You should see all threads from your app, and your app only. The ANR mechanism does not attempt to determine a set of "relevant" threads. The place to start is the UI thread, usually the app's "main" thread, to see if you have caught it in the act of being stuck. Sometimes the app is slow, not stuck, and the cause of the slowness is actually a different process that is soaking the CPU or disk bandwidth, but you can't see that in a stack trace... and you will likely get a stack trace that reflects execution past the point where it was "stuck".

  3. The "place" is the event that was not responded to (in this case, a key event), and the Activity that the system was attempting to interact with.

  4. That's normal; you'll see that when a thread is "parked" via java.util.concurrent.locks.LockSupport.park() in Dalvik. Remember that the lock is released while the thread is waiting, so in this case it's just waiting for another thread to come along and notify it.

Addressing a point raised in the comments: it's possible for a native crash to cause an ANR if (1) the native crash doesn't kill the app entirely, which is what it's supposed to do; and (2) the thread that died was the UI thread, or held a resource that the UI thread was waiting for. If you don't have access to the full logcat, you can check the thread list to confirm that all of your threads are alive.

When looking at an ANR, the first thing you need to figure out is if it's permanently stuck or just temporarily slowed. This should be obvious to the person using the app. Permanent freezes are usually the easiest to solve, as the stack trace will generally lead you to what went wrong. Start with the UI thread and walk through the trace until you find some bit of code that is spinning or stuck in a native call. (There's a trick with native calls though -- if it says NATIVE then it's still in native code, but if it says SUSPENDED on a thread with a native method at the top of the stack, then it's not stuck, but rather in the act of returning from native to managed code.)

Transient ANRs can be harder, especially if they're happening on customer devices whose configuration is unknown. If they're running CPU benchmarks in the background on a device that's stalling because of a failing flash part, your app is going to have a bad time. Sometimes the stack trace points you in the general direction of the problem (e.g. this one, where it appears slow rendering and coarse locking were stalling the UI thread), other times the trace is captured after the app is back to running normally.

Community
  • 1
  • 1
fadden
  • 51,356
  • 5
  • 116
  • 166
  • 1
    "Start with the UI thread and walk through the trace until you find some bit of code that is spinning or stuck in a native call" Can you elaborate this? What can I do with a main thread stuck in native call? – Adeel Ahmad Apr 09 '18 at 13:56
4

It's probably not the overall general recipe for detecting ANRs that your are looking for, but a good start is enabling strict mode for your application.

You will be able to check logcat and the system will let you know when you're doing something wrong.

Just add these lines to your Application or Activity's onCreate() method:

if (BuildConfig.DEBUG) {
        StrictMode.setThreadPolicy(new StrictMode.ThreadPolicy.Builder()
                .detectDiskReads()
                .detectDiskWrites()
                .detectAll()
                .penaltyLog()
                .build());

        StrictMode.setVmPolicy(new StrictMode.VmPolicy.Builder()
                .detectLeakedSqlLiteObjects()
                .detectLeakedClosableObjects()
                .penaltyLog()
                .build());
    }

More details here: http://developer.android.com/reference/android/os/StrictMode.html

kleinsenberg
  • 1,323
  • 11
  • 15
  • StrictMode is fine and usefull, but my question is about ANRs from production, where you cant reproduce them locally. – rouen Mar 19 '15 at 08:05
  • Even if you can't reproduce them per se, as in see them, the Strict mode will log everything you are doing on the UI thread, so you should be able to see it in logs. – kleinsenberg Mar 19 '15 at 12:13