5

I aim to use the Android MediaCodec for decoding a video stream, then use the output images for further image processing in native code.

Platform: ASUS tf700t android 4.1.1. Test stream: H.264 full HD @ 24 frm/s

With the Tegra-3 SoC inside, I am counting on hardware support for the video decoding. Functionally, my application behaves as expected: I indeed can access the decoder images and process them properly. However, I experience a very high decoder cpu load.

In following experiments, process/thread load is measured by "top -m 32 -t" in adb shell. To get reliable output from "top", all 4 cpu cores are forced active by running a few threads looping forever at lowest priority. This is confirmed by repeatedly executing "cat /sys/devices/system/cpu/cpu[0-3]/online". To keep things simple, there is only video decoding, no audio; and there is no timing control so the decoder runs as fast as it can.

First experiment: run the application, calling the JNI processing function, but all further processing calls are commented-out. Results:

  • throughput: 25 frm/s
  • 1% load of thread VideoDecoder of the application
  • 24% load of thread Binder_3 of process /system/bin/mediaserver

It seems that the decoding speed is CPU limited (25% of a quad-core CPU)... When enabling the output processing, decoded images are correct and the application works. Only problem: too high cpu load for decoding.

After tons of experiments, I considered giving the MediaCodec a surface to draw its result. In all other aspects, the code is identical. Results:

  • throughput 55 frm/s (nice!!)
  • 2% load of thread VideoDecoder of the application
  • 1% load of thread mediaserver of process /system/bin/mediaserver

Indeed, the video is shown on the provided Surface. Since there is hardly any cpu load, this must be hardware accelerated...

It seems that de MediaCodec is only using the hardware accelaration if a Surface is provided?

So far, so good. I was already inclined to use the Surface as a work-around (not required, but in some cases even a nice-to-have). But, in case a surface is provided, I cannot access the output images! Result is an access violation in the native code.

This really puzzles me! I did not see any notion of access limitations, or whatsoever in the documentation http://developer.android.com/reference/android/media/MediaCodec.html. Also nothing in this direction was mentioned at the google I/O presentation http://www.youtube.com/watch?v=RQws6vsoav8.

So: how to use hardware accelarated Android MediaCodec decoder and access images in native code? How to avoid the access violation? Any help is appreceated! Also any explanation or hint.

I am pretty sure the MediaExtractor and MediaCodec are used properly, since the application is functionaly ok (as long as I do not provide a Surface). It is still pretty experimental, and a good API design is on the todo list ;-)

Note that the only difference between the two experiments is variable mSurface: null or an actual Surface in "mDecoder.configure(mediaFormat, mSurface, null, 0);"

Initialization code:

mExtractor = new MediaExtractor();
mExtractor.setDataSource(mPath);

// Locate first video stream
for (int i = 0; i < mExtractor.getTrackCount(); i++) {
    mediaFormat = mExtractor.getTrackFormat(i);
    String mime = mediaFormat.getString(MediaFormat.KEY_MIME);
    Log.i(TAG, String.format("Stream %d/%d %s", i, mExtractor.getTrackCount(), mime));
    if (streamId == -1 && mime.startsWith("video/")) {
        streamId = i;
    }
}

if (streamId == -1) {
    Log.e(TAG, "Can't find video info in " + mPath);
    return;
}

mExtractor.selectTrack(streamId);
mediaFormat = mExtractor.getTrackFormat(streamId);

mDecoder = MediaCodec.createDecoderByType(mediaFormat.getString(MediaFormat.KEY_MIME));
mDecoder.configure(mediaFormat, mSurface, null, 0);

width = mediaFormat.getInteger(MediaFormat.KEY_WIDTH);
height = mediaFormat.getInteger(MediaFormat.KEY_HEIGHT);
Log.i(TAG, String.format("Image size: %dx%d format: %s", width, height, mediaFormat.toString()));
JniGlue.decoutStart(width, height);

Decoder loop (running in a separate thread):

ByteBuffer[] inputBuffers = mDecoder.getInputBuffers();
ByteBuffer[] outputBuffers = mDecoder.getOutputBuffers();

while (!isEOS && !Thread.interrupted()) {
    int inIndex = mDecoder.dequeueInputBuffer(10000);
    if (inIndex >= 0) {
        // Valid buffer returned
        int sampleSize = mExtractor.readSampleData(inputBuffers[inIndex], 0);
        if (sampleSize < 0) {
            Log.i(TAG, "InputBuffer BUFFER_FLAG_END_OF_STREAM");
            mDecoder.queueInputBuffer(inIndex, 0, 0, 0, MediaCodec.BUFFER_FLAG_END_OF_STREAM);
            isEOS = true;
        } else {
            mDecoder.queueInputBuffer(inIndex, 0, sampleSize, mExtractor.getSampleTime(), 0);
            mExtractor.advance();
        }
    }

    int outIndex = mDecoder.dequeueOutputBuffer(info, 10000);
    if (outIndex >= 0) {
        // Valid buffer returned
        ByteBuffer buffer = outputBuffers[outIndex];
        JniGlue.decoutFrame(buffer, info.offset, info.size);
        mDecoder.releaseOutputBuffer(outIndex, true);
    } else {
        // Some INFO_* value returned
        switch (outIndex) {
        case MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED:
            Log.i(TAG, "RunDecoder: INFO_OUTPUT_BUFFERS_CHANGED");
            outputBuffers = mDecoder.getOutputBuffers();
            break;
        case MediaCodec.INFO_OUTPUT_FORMAT_CHANGED:
            Log.i(TAG, "RunDecoder: New format " + mDecoder.getOutputFormat());
            break;
        case MediaCodec.INFO_TRY_AGAIN_LATER:
            // Timeout - simply ignore
            break;
        default:
            // Some other value, simply ignore
            break;
        }
    }

    if ((info.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
        Log.d(TAG, "RunDecoder: OutputBuffer BUFFER_FLAG_END_OF_STREAM");
        isEOS = true;
    }
}
SubbaReddy PolamReddy
  • 2,083
  • 2
  • 17
  • 23
Bram
  • 55
  • 1
  • 5
  • Still no solution. Any suggestion is still welcome. Also suggestions for experiments to increase understanding. Anyone got hardware decoding working using MediaCodec? Maybe on an other platform? – Bram Mar 25 '13 at 10:53
  • Bram, I'm trying to solve exactly the same issue. It looks that this slowdown isn't about multiple copies of the decoded buffer. When decoded data is meant for presentation to a native surface it looks that there is some direct data path and it uses TILER (tiled rendering). When you need to access full YUV frame (e.g. you want to access decoded buffer) decoder needs to do some extra tasks like rendering all that data to a memory buffer and copying it which makes it so slow. I literally wasted a week of my life trying to fix the issue, but it appears that there is nothing to fix. – Pavel P Jun 23 '13 at 21:20
  • More over, in my case I had a 720p@30fps which I wasn't able to decode realtime while native player had no issues paying it. – Pavel P Jun 23 '13 at 21:21

2 Answers2

4

If you configure an output Surface, the decoded data is written to a graphic buffer that can be used as an OpenGL ES texture (via the "external texture" extension). The various bits of hardware get to hand data around in a format they like, and the CPU doesn't have to copy the data.

If you don't configure a Surface, the output goes into a java.nio.ByteBuffer. There's at least one buffer copy to get the data from the MediaCodec-allocated buffer to your ByteByffer, and presumably another copy to get the data back out into your JNI code. I expect what you're seeing is the overhead cost rather than software decoding cost.

You might be able to improve matters by sending the output to a SurfaceTexture, rending into an FBO or pbuffer, and then using glReadPixels to extract the data. If you read into a "direct" ByteBuffer or call glReadPixels from native code, you reduce your JNI overhead. The down side to this approach is that your data will be in RGB rather than YCbCr. (OTOH, if your desired transformations can be expressed in a GLES 2.0 fragment shader, you can get the GPU to do the work instead of the CPU.)

As noted in another answer, the decoders on different devices output ByteBuffer data in different formats, so interpreting the data in software may not be viable if portability is important to you.

Edit: Grafika now has an example of using the GPU to do image processing. You can see a demo video here.

fadden
  • 51,356
  • 5
  • 116
  • 166
  • Thanks! So you state with @fadden that using a surface us mutually exclusive to accessing the ByteBuffer. Missing doc in [dequeueOutputBuffer](http://developer.android.com/reference/android/media/MediaCodec.html#dequeueOutputBuffer%28android.media.MediaCodec.BufferInfo,%20long%29)?!?! Well, for me, I need YCbCr in native code, so moving to GLES does not help. Further, the surface was only for testing. About copying the data? Would that take 40ms (25 frm/s)? In my native code, I can do that in 11 ms. So, still way too much cpu load. Right? Btw. portability is not my main concern. – Bram Apr 09 '13 at 12:31
  • 1
    The `MediaCodec` docs could be better. It's hard to know where all the time is going without being able to see all of what your device is doing -- they're all a little different. It's possible the "native" format used by Surface is something crazy and they're transcoding it to a simpler YUV on the way to the ByteBuffer. Going from 25fps to 55fps is 40-18=22ms difference -- two of your buffer copies. Sometimes in the logcat output you can see it opening (or not) the hardware decoder device. In any event, I don't see anything wrong with what you're doing. – fadden Apr 09 '13 at 16:47
  • Checked the logcat output. It does not show any difference when operating with a surface compared to without a surface (while indeed some info is shown when starting and stopping the decoder). So this confirms your statement: our app does not suffer from decoding in software, rather from overhead. So this closes the current topic. Next topic obviously will be: how to reduce this overhead? If android itself can get the images to the display with almost 0 cpu load, how can I get them in my app with almost 0 cpu load? – Bram Apr 10 '13 at 13:24
  • 1
    The hardware and system software is geared toward moving buffers of data around and displaying them with as little overhead as possible. It's not so good at making the raw data accessible to apps. – fadden Apr 11 '13 at 00:54
  • FWIW, a bunch of `MediaCodec` samples (with some information on new features in Android 4.3) are available here: http://bigflake.com/mediacodec/ – fadden Jul 24 '13 at 20:26
  • very interesting source of information and examples! My problem is related to decoding, so the new 4.3 feature to encode data from a Surface is not of direct help. However, one of the examples triggered an other idea. Can I use the decoder to output to a Surface, then read from the Surface and process that data. Might require additional rgb -> yuv conversion; maybe done on a gfx shader? When I feel really bold, I'll start some experimenting... – Bram Aug 01 '13 at 09:32
0

I use mediacodec api on nexus 4 and get the output color format of QOMX_COLOR_FormatYUV420PackedSemiPlanar64x32Tile2m8ka. I think this format is a kind of hardware format and only can be rendered by hardware rendering. Interestingly, I find that when I use null and actual Surface to configure the surface for MediaCodec, the output buffer length will be change to a actual value and 0 respectively. I don't know why. I think you can do some experiments on different devices for more results. About hardware accelerating you can see http://www.saschahlusiak.de/2012/10/hardware-acceleration-on-sgs2-with-android-4-0/

  • Thanks for hinting on color formats. Without surface, my `mDecoder.getOutputFormat().getInteger("color-format")` is 19 (COLOR_FormatYUV420Planar in [link](http://developer.android.com/reference/android/media/MediaCodecInfo.CodecCapabilities.html)). With surface, it is 256. No idea what that means... Further with an actual Surface, the info.size becomes 0. Obviously, I should not try to read a buffer with size 0 in my JniGlue.decoutFrame(). This may explain the crash. But still... providing a surface was just a workaround to get hardware decoding to run... – Bram Apr 04 '13 at 16:02
  • With output to a Surface, you don't get any data in the `ByteBuffer`. You still get an index back from `dequeueOutputBuffer()` so you can be notified that a frame is available, and choose whether or not to render it with the `render` arg to `releaseOutputBuffer()`. You can't touch the actual bits without rendering the Surface. – fadden Apr 04 '13 at 21:38
  • Whether I set decoder.configure(format, null, null, 0); OR decoder.configure(format, surface, null, 0); I get the color-format to be the same - COLOR_QCOM_FormatYUV420SemiPlanar - Constant Value: 2141391872 (0x7fa30c00) for Nexus 7 AND COLOR_TI_FormatYUV420PackedSemiPlanar - Constant Value: 2130706688 (0x7f000100) for Galaxy Nexus. Why? – Harkish Oct 14 '13 at 20:47