1

Occasionally Android system calls, such as ContentResolver.query(...), can take 1000x times longer than normal, for example ~10 seconds for a single call that might normally complete in ~10 milliseconds. My question is how to identify cases like this, understand why it's happening and resolve the problem.

The example below is retrieving the "user profile" on an Android system. On my Nexus 6 it normally takes ~10 milliseconds to do this. Older devices were similarly quick. Occasionally though it can 20+ seconds. Using traceview, it shows that virtually all the time is spent in "context switch":

enter image description here

My device unusually has ~7,000 contacts. For the "user profile" that I'm requesting, the result is only a single row. It's requested using Uri.withAppendedPath(Profile.CONTENT_URI, Data.CONTENT_DIRECTORY) - which I presume would be a common, optimized case. At the moment, the queries are done on the main thread, which may be complicating things... the Android documentation mentions that "you should do queries asynchronously on a separate thread". I will migrate to that in the future but I'm skeptical of that being the cause as it was working fine previously.

Another surprising factor is the strange consistency. I have multiple queries in the app. For a sequence of runs, the "user profile" will be consistently slow then start being fast again and I can't reproduce the issue. Likewise other contact queries that aren't the user profile, will be fast and then become slow. For the past six months, all the queries have been fast through the Nexus 5 and Nexus 6 devices - this only appeared in the last week.

The query:

Uri uri = Uri.withAppendedPath(Profile.CONTENT_URI, Data.CONTENT_DIRECTORY),
Log.d(TAG, "getOwner: " + uri.toString());
Cursor cur = mResolver.query(
    uri,
    PROJECTION,  // 6 columns
    null,
    null,
    Data.IS_PRIMARY + " DESC");
Log.d(TAG, "getOwner: query count -  " + cur.getCount());

Log Output:

// NOTE 18 second gap between log entries
19:20:33.134 D/ConnectActivity﹕ getOwner: URI: content://com.android.contacts/profile/data
19:20:51.779 D/ConnectActivity﹕ getOwner: query count -  1

I still find this behaviour surprising and don't understand why it occurs.

brunobowden
  • 1,492
  • 19
  • 37
  • 3
    If the content provider is in a different process it probably sometimes has to be started first. – Henry Jan 06 '15 at 08:51
  • I've seen the slow down repeated during the same app startup, alternating between fast and slow. Also, 10 seconds to start up anything is incredibly long. At the moment, the calls are happening in the onCreate method, so there's only a single thread. I will move them at a later point but right now this has been working fine for experimentation. – brunobowden Jan 06 '15 at 18:30
  • Next time I see this, I'm trying Traceview: http://developer.android.com/tools/debugging/debugging-tracing.html – brunobowden Jan 06 '15 at 18:32
  • Android isn't a real time system, and the Binder, i.e. the IPC system in use to connect client and provider process can be very slow. This is why the bottleneck you're seeing in traceview is while context switching. – HappyCactus Jan 08 '15 at 15:35
  • HappyCactus - context switching is slow but 18 seconds on a new Nexus 6 device just floors me as hideously bad. Is there any way I can trace what it's doing in the context switch? To find out if other threads are blocking access to the contacts resource, or if there's a blocking network fetch? – brunobowden Jan 08 '15 at 19:07
  • The app startup is based on knowing who the user is, so providing speedy information on who they are to tailor the interface is very important. Waiting 18 seconds to find out the user's name is crazy. – brunobowden Jan 08 '15 at 19:08
  • 1
    ContactsContract.Data suggests that another process could be the cause: "The flip side of using batched operations is that a large batch may lock up the database for a long time preventing other applications from accessing data and potentially causing ANRs ("Application Not Responding" dialogs.)" - http://developer.android.com/reference/android/provider/ContactsContract.Data.html – brunobowden Jan 08 '15 at 19:59
  • You are right - long context switch in traceview means that your query is waiting for something that happens outside the scope of your process. The easy way to fix it is to use `AsyncTask`. I will not be surprised if this change alone will be enough to consistently produce the answer quickly. But if it isn't, you can cache the Owner data and not depend on update during app start. _You asked how you can debug such lock_. Not really, unless there are clear symptoms in the system Logcat. – Alex Cohn Jan 09 '15 at 10:37

1 Answers1

4

Levels of debugging and improvements (the last two are specifically for slow "context switch"):

  1. Query Improvements: query Uri, Projection size and column indices caching more details
  2. Query Filtering: reduce size of the query result through better filtering
  3. Asynchronous: slow queries shouldn't block the UI thread, use
  4. Observe Logs: around the slowdown to see if it has any clues more details on logs
  5. Traceview: to identify where the call is spending time
  6. Context Switch: explanation

Query Improvements & Filtering

In mosts cases, querying the right Uri, reducing the number of columns in the projection and caching column indices are the primary improvements: Getting name and email from contact list is very slow

This example is for obtaining the name from the user profile:

Cursor cur = mResolver.query(
    // Uri will only return user profile rather than everything in contacts
    Uri.withAppendedPath(Profile.CONTENT_URI, Data.CONTENT_DIRECTORY),
    // Projection only has 2 columns to reduce the data returned
    new String[] {
            Data.MIMETYPE,
            ContactsContract.Data.DISPLAY_NAME};
    // Filter matches on specific mimetype to reduce result size
    Data.MIMETYPE + "=?",
    // Filter matches on specific mimetype to reduce result size
    new String[]{StructuredName.CONTENT_ITEM_TYPE},
    // No sorting may improve performance
    null);

Asynchronous

The Android Documentation suggests using a CursorLoader:

For the sake of clarity, the code snippets in this section call ContentResolver.query() on the "UI thread"". In actual code, however, you should do queries asynchronously on a separate thread. One way to do this is to use the CursorLoader class, which is described in more detail in the Loaders guide. Also, the lines of code are snippets only; they don't show a complete application.

Observe Logs

You should look at lot around the slowdown to both quantify it and see if it has only clues to the cause. ContentResolver.query(...) calls often take 10-100 milliseconds and anything within that range may be improved as above but is fairly expected. In the range 1-10 seconds, this may be caused by a context switch to other processes. When the context switch occurs, ART (default Runtime in Android 5.0), will output a log line indicating the context switch and the time taken for it:

Traceview

Android's traceview can be used to identify where time is being spent. The basic usage is to wrap the method you wish to profile as follows:

Debug.startMethodTracing("trace-name");
// Code to profile
Debug.stopMethodTracing();

Next pull the logs from the device (I believe this is the default path):

adb pull /storage/emulated/legacy/dmtrace.trace

Android Device Monitor can be used to open the file and view. From Android Studio, select Tools -> Android -> Android Device Monitor, then open the file. Often it is helpful to rearrange the Incl Real Time and Incl Cpu Time columns to the far left. These show the time taken by a method (and it's children methods) both as cpu time and real time (which also including blocking IO calls that aren't measured with cpu time in the app process - more details). The most suspicious entries are where Incl Real Time vastly exceeds the Incl Cpu Time. That means that means the method took a long time but didn't do much in your code, so it must have been blocked by something outside of the app process.

You can see that a list methods and the time for each. Starting at "0 (top level)" click the triangle to see the "children" and open that. Each child will list a method number that you can use to find the method elsewhere in traceview and looks at their children, descending through the hierarchy. Looking for cases where Incl Real Time >> Incl Cpu Time, you will often find it ends with "(context switch)". This is the long blocking event that's slowing down your process.

traceview of ContentResolver.query with slows context switch

Context Switch

Wikipedia description:

In computing, a context switch is the process of storing and restoring the state (context) of a process or thread so that execution can be resumed from the same point at a later time. This enables multiple processes to share a single CPU and is an essential feature of a multitasking operating system. What constitutes the context is determined by the processor and the operating system.

In the case of Android, this indicates the OS has chosen to preserve the state of your app, remove it from the CPU so that it can execute another process. When the context switch is over, the process may be reinstated on the CPU and resume execution.

Community
  • 1
  • 1
brunobowden
  • 1,492
  • 19
  • 37