Android GC and AudioTrack, GC thread stuck in dlmalloc_inspect_all and AudioTrack stuck .. tryLock

Question

A quick background: The app is an audio player, ffmpeg is compiled as a native shared object and used for decode, a separate native library is compiled as a shared object and used for audio processing, and AudioTrack is used to output the processed audio. All of the audio functionality is wrapped into a class that uses a static class variable to ensure only one instance. Within this class: a java thread is used to get data from ffmpeg and complete the audio processing via the native processing library. The native processing call takes between 2 and 3.5 msec depending on the configuration. The processed audio is in an array of ByteBuffers managed by read and write counting semaphores. A separate java thread ByteBuffer.gets a block of audio, decrements the sem count and sends the byte[] data to AudioTrack. AudioTrack is configured in streaming mode with a buffer size 2 x the size returned by getMinBufferSize. The audio data goes through the system in 512 sample per channel blocks, for stereo, 2048 bytes.

The issue: Everything works great, for a random amount of time and then it all stops, no crash, no SIGSEG, the app simply consumes most of the CPU and does nothing. The GUI is non responsive, the debugger in ADT loses its connection to the app. Using the shell and top -m 10 -t, I can see the apps GC thread is consuming 49%, I assume a single core. Occasionally, AudioTrack pops up with low utilization. The other threads associated with the app never appear in the top 10. This happens very quickly on 4.2.2, less deterministic on 4.0.1 and 4.1.1.

In an effort to resolve: I used libc.debug.malloc 10 to validate the native memory usage. Found and fixed a couple of leaks, but nothing, per the libc, is walking off the end of a buffer. The ByteBuffers were installed as an attempt to workaround, byte[] were used prior and exhibited the same issues. I removed AudioTrack and replaced with native code based on OpenSL ES, same results. No surprise as OpenSL ES uses AudioTrack. Log.i style debug messages show the write to AudioTrack thread stops consuming data, the ffmpeg read and process thread continues until the ByteBuffer array fills. I've put System.gc() calls in strategic locations, sometimes takes longer, but the hang still occurs. I set the priority of the 'processed audio data to AudioTrack' thread at THREAD_PRIORITY_URGENT_AUDIO, no change observed.

I have searched far and wide for instances of similar issues and have found very little information. I installed ARM's DS-5 debug capabilities for Eclipse/ADT. This tool was able to maintain a connection to the spinning app. Pausing shows the GC thread is in an endless loop inside of dlmalloc_inspect_all, perhaps trying to reclaim or consolidate native heap. The AudioTrack thread is in nanosleep when paused, called from usleep, only two instances of this in AudioTrack.cpp, one in processAudiobuffer, the other in tryLock. tryLock is called from stepServer and framesReady. I have not been able to get a stack dump from the hung app, kill -3 yields spin on suspend #1 threadid=2 (pcf=0), the app is never declared ANR hence no /data/ANR/traces.txt.

My synopsis - GC is doing its thing and having the native heap inspected. Per documentation, GC will not suspend in the middle of JNI calls. It has to suspend the AudioTrack thread though, and when the native heap inspection coincides with the execution of AudioTrack's processAudioBuffer, a deadlock occurs.

Questions: 1) I would certainly benefit from a stack trace of the native components, short of a dev platform and JTAG debugger, are there any additional methods to try? 2) Has anyone seen any issues with GC and AudioTrack going the deadlock route? 3) Are there any chances that the GC dlmalloc_inspect_all call can be suppressed or otherwise synchronized to avoid this issue? 4) Any suggestions toward resolving this issue?

I'd be happy to post some code if helpful

the resolution: Although called in onDestroy(), I found I had to manually call a native code 'clean up' function to get log messages from the debug libc in libc.debug.malloc 10 mode. There was a 'walk off buffer' error in a hand written neon assembly function. Corrupted data on the heap leads to undefined behavior, imagine that. When looking for native heap problems, make sure clean up code gets called, android killing the app limited the allocation checks/reports in this case. — samc, May 02 '13 at 19:53

score 0 · Answer 1 · edited May 23 '17 at 11:44

Faced same issue (random hang - all stops, no crash, no SIG, consumes most of the CPU and does nothing). Problem was in corrupted memory in JNI code (like buffer overrun).

Also this issue seems very device dependent (100% repro on 2 of my devices, and no one issue on other 3) Other platforms (win32, iOS) also didn't spotted corrupted memory (i'm working on crossplatform game) So android's dalvik memory manager is nice "tool" to detect memory bugs, now we testing everyday build on "memory corruption sensitive" device. Helps to ensure new code stability

For more info try

And an excellent article by Dianne Hackborn

Android GC and AudioTrack, GC thread stuck in dlmalloc_inspect_all and AudioTrack stuck .. tryLock

1 Answers1