2

I am currently writing a decoder for H264 video stream. Target platform is Android, so I am using MediaCodec API (Android OS >= 6.0).

I've tested my code on 4 devices (same one on all 4):

  • It works nicely on Xiaomi Redmi 5 Plus (it's actually quite fast there).
  • It works slow as hell on Nexus 7 and Samsung Galaxy Tab A
  • It fails on Samsung Galaxy Tab S2 with mysterious error code -10000 from AMediaCodec_dequeueOutputBuffer (configure and start return proper values (AMEDIA_OK)).

So my questions are:

  1. Can I optimize it somehow? I tested each MediaCodec API call for time performance and it looks like AMediaCodec_dequeueOutputBuffer is a huge bottleneck here (80%-90% of time for each frame).
  2. Is there anything I can do with this -10000 error on Galaxy Tab 2? I read MediaCodecs docs and it's not described there. I've only found in VLC's sources (modules/codec/omxil/mediacodec_ndk.c) that const AMEDIA_ERROR_UNKNOWN = -10000 (question 2.b: where did they found this constant?).

Devices specification (decoders from /etc/media_codecs.xml):

  • Xiaomi Redmi 5 Plus: Android 7.1.2 "video/avc" decoders: OMX.qcom.video.decoder.avc, OMX.qcom.video.decoder.avc.secure

  • Nexus 7 (tablet) Android 6.0.1 "video/avc" codecs: OMX.qcom.video.decoder.avc, OMX.qcom.video.decoder.avc.secure

  • Samsung Tab A
    Android 7.1.1 "video/avc" decoders: OMX.qcom.video.decoder.avc, OMX.qcom.video.decoder.avc.secure, OMX.SEC.avc.sw.dec

  • Samsung Tab S2:
    Android 7.0 "video/avc" decoders: OMX.Exynos.avc.dec, OMX.Exynos.avc.dec.secure, OMX.SEC.avc.sw.dec

I can see that all devices with proper execution (even if slow) have Qualcomm decoder in common.

My code:

//initialization (I omitted checks for errors, all initialization is executed without any errors.:
//f contains pointers to functions from libmediandk.so

const char mime[] = "video/avc";
mDecoder = f.createDecoderByType(mime);
AMediaFormat* mFormat = f.createMediaFormat();

const int colorFormat = 19; //COLOR_FormatYUV420Planar
f.setString(mFormat, c.keyMime, mime);
f.setInt32(mFormat,  c.keyWidth, width);
f.setInt32(mFormat,  c.keyHeight, height);
f.setInt32(mFormat,  c.keyColorFormat, colorFormat);
f.setInt32(mFormat, "encoder", 0);
f.setInt32(mFormat, "max-input-size", 0);

//both sps and pps are extracted from the stream
f.setBuffer(mFormat, "csd-0", sps, sizeof(sps));
f.setBuffer(mFormat, "csd-1", pps, sizeof(pps));

media_status_t status = f.configure (mDecoder, mFormat, NULL, NULL, 0);
status = f.start(mDecoder);

f.deleteMediaFormat(mFormat);

lastOutputBufferIdx = -1;
//this is executed every loop
//data -> char* with this frame's H264 encoded data
//I omitted error check for clarity

const int TIMEOUT_US = -1; //-1 -> blocking mode
AMediaCodecBufferInfo info;
char* buf = NULL;

if (lastOutputBufferIdx != -1){
    f.releaseOutputBuffer(mDecoder, lastOutputBufferIdx, false);
    lastOutputBufferIdx = -1;     
}
ssize_t iBufIdx = f.dequeueInputBuffer(mDecoder, TIMEOUT_US);
if (iBufIdx >= 0){
     buf = f.getInputBuffer(mDecoder, iBufIdx, &bufsize);
     int usedBufSize = 0;
     if (buf){
         usedBufSize = dataSize;
         memcpy(buf, data, usedBufSize);
     }
     media_status_t res = f.queueInputBuffer(mDecoder, iBufIdx, 0, usedBufSize, getTimestamp(), 0);
}

//here's my nemesis (this line is both bottleneck and -10000 generator):
ssize_t oBufIdx = f.dequeueOutputBuffer(mDecoder, &info, TIMEOUT_US);

//I am not interested in processing any error codes from {-1,-2,-3}
//INFO_TRY_AGAIN_LATER, INFO_OUTPUT_FORMAT_CHANGED, INFO_OUTPUT_BUFFERS_CHANGED)
while (oBufIdx == -1 || oBufIdx == -2 || oBufIdx == -3){
    oBufIdx = f.dequeueOutputBuffer(mDecoder, &info, TIMEOUT_US);
}

while (oBufIdx >= 0)
{
    buf = f.getOutputBuffer(mDecoder, oBufIdx, &bufsize);
    AMediaFormat format = f.getOutputFormat(mDecoder);
    f.getInt32(format, "width", &width);
    f.getInt32(format, "height", &height);
    f.deleteMediaFormat(format);

    //yuv_ is struct returned by my function
    yuv_.data = buf + info.offset;

    yuv_.size = bufsize;
    yuv_.width = width;
    yuv_.height = height;

    yuv_.yPlane = yuv_.data + info.offset;
    yuv_.uPlane = yuv_.yPlane + height * width;
    yuv_.vPlane = yuv_.uPlane + (height * width) / 4;

    yuv_.yStride = width;
    yuv_.uStride = width / 2;
    yuv_.vStride = width / 2;
}

lastOutputBufferIdx = oBufIdx;

I've seen that MediaCodec can be run in asynchronous mode (which could be a bit faster), but I am not sure if I can use it as I am decoding a live stream video instead of decoding some .mp4 from a hard drive. What I wanted to say is that there is (probably) no option to run decoding simultaneously.

Dyraton01
  • 21
  • 2

1 Answers1

1

The big issue is that you are feeding only one packet to the decoder and then blocking, waiting for that single decoded frame to be returned.

Hardware decoders usually have got a bit of latency; passing one single frame through the decoder takes a longer time than it takes between individual frames if you just keep feeding them.

So don't stop and wait for the output, but feed more input packets (if you have them available) if possible. The time from the first packet input to the first decoded output will probably be the same, but you should get the next frame much sooner after that. And some decoders won't even return anything at all, regardless of how long you wait, until you've given it at least a few input packets.

See e.g. for more discussion and more links on the same matter: https://stackoverflow.com/a/37513916/3115956

mstorsjo
  • 12,983
  • 2
  • 39
  • 62
  • Thanks for your answer. I cannot use async mode because my app requires immediate output after given input frame (it's realtime). I realized one interesting thing: On Redmi5 FullHD using my current approach frames are decoded faster than HD frames! I have one hypothesis: Maybe the decoder stops and doesn't return all data immediately because it waits for future frames and after some time spent waiting it decides to release decoded frame. Is there a way to make him release the frame immediately? I've tried flush() from MC API but it doesn't work (or I do something wrong). Thanks. – Dyraton01 Aug 29 '19 at 13:45
  • You don't need to use async mode, you can do it differently in synchronous mode as well. After feeding one input packet, check for output frames, but use a very small timeout instead, and just poll once. If the decoder doesn't have a frame to return to you yet, you go back to waiting for more input from the network again (and possibly later poll the decoder to check if there's any output yet). This should fix the "slow as hell" case. – mstorsjo Aug 29 '19 at 18:34
  • Think of the decoder (in the "slow as hell" cases) this way: It can have many frames in flight through the decoder at once. Decoding a single frame takes a fixed amount of time that you can't affect (this can in some cases be as much as 150-200 ms). If you feed one frame, wait for the output, feed another frame, wait for the output, you can at most reach 5-6 fps. But despite this, the decoder might be able to have 6 frames in flight at once. So for a 30 fps stream, after 33 ms, you might get the next frame to decode. – mstorsjo Aug 29 '19 at 18:37
  • When you get the next frame from the stream, you should give it to the decoder directly, even if it hasn't output the first frame yet. This way, after you've passed the decoder the first 6 frames with no output, it will output the first frame (200 ms after the first was given to the decoder), and after that will output the later frames 33 ms apart. – mstorsjo Aug 29 '19 at 18:38
  • So even if it's realtime and you want to display frames as soon as possible, you should not make waiting for one single output from the decoder block you from feeding more input to it. You need to regularly check for both more things to input to the decoder and whether the decoder has any output for you. – mstorsjo Aug 29 '19 at 18:40