Can Android MediaCodec decodes to two Surfaces at the same time?

Question

I'm trying to improve the frame extraction of our app. Basically what I've done is combine the solution from Grafika's MoviePlayer for forward seeking and BigFlake's ExtractMpegFramesTest to extract the frame. For the extraction, I seek back to the previous key frame, then decode forward and only save the last frame. Something like this ~~(see my previous question for more complete explanation):~~

decoder.releaseOutputBuffer(decoderStatus, doRender);
    if (doRender) {
        if (VERBOSE) Log.d(TAG, "awaiting decode of frame " + decodeCount);
        outputSurface.awaitNewImage();
        outputSurface.drawImage(false);

        if(extractor.getSampleTime() == mPosition){ 
            Log.d(TAG, "sampleTime: " + extractor.getSampleTime() + " mPosition: " + mPosition + "----- EXTRACTING FRAME");
            long startWhen = System.currentTimeMillis();
            outputSurface.saveFrame();
            long frameSaveTime = System.currentTimeMillis() - startWhen;
            Log.d(TAG, "sampleTime: frame saved in: " + frameSaveTime + " millisecond");
            return;
         }
         decodeCount++;
     }

~~The problem is sometimes the sample time retrieved from extractor.getSampleTime() when seeking backward then decode forward doesn't seem to match the one from straight forward seeking.~~

I've included a log to make this clearer:

position is the seeking position in microsecond
sampleTime: 12112100 -- position: 12139000 ----- FORWARD
sampleTime: 12120441 -- position: 12139000 ----- FORWARD
sampleTime: 12128783 -- position: 12139000 ----- FORWARD
sampleTime: 12137125 -- position: 12139000 ----- FORWARD

sampleTime: 12012000 -- position: 12139000 ----- BACKWARD
sampleTime: 12020341 -- position: 12139000 ----- BACKWARD
sampleTime: 12028683 -- position: 12139000 ----- BACKWARD
sampleTime: 12037025 -- position: 12139000 ----- BACKWARD
sampleTime: 12045366 -- position: 12139000 ----- BACKWARD
sampleTime: 12053708 -- position: 12139000 ----- BACKWARD
sampleTime: 12062050 -- position: 12139000 ----- BACKWARD
sampleTime: 12070391 -- position: 12139000 ----- BACKWARD
sampleTime: 12078733 -- position: 12139000 ----- BACKWARD
sampleTime: 12087075 -- position: 12139000 ----- BACKWARD
sampleTime: 12095416 -- position: 12139000 ----- BACKWARD
sampleTime: 12103758 -- position: 12139000 ----- BACKWARD
sampleTime: 12112100 -- position: 12139000 ----- BACKWARD
sampleTime: 12120441 -- position: 12139000 ----- BACKWARD
sampleTime: 12128783 -- position: 12139000 ----- BACKWARD

~~As you can see, in forward seeking the extractor.getSampleTime() can reach to position 12137125 while seeking back then decode forward, it can only reach 12128783~~. I'm not sure why it happens but this results in a mismatch between representation frame and extracted frame. Also this method is not very efficient as I have to set up a EGLSurface and decode to it every time I need to extract a frame. Depending on how far the required frame from the previous key frame, this operation can take 3 to 5 seconds, definitely too long for multiple extraction.

I would like to ask is it possible to make the decoder decode to both surfaces (a SurfaceView for displaying and a EGLSurface for frame retrieval) at the same time so that I can potentially solve both of these accuracy and performance issues.

I've also tried using FFmpeg to retrieve frame before, the performance is about the same. If there's a better way to retrieve frame than using OpenGL, I'm very willing to try.

EDIT: After further testing, I can match the extractor.getSampleTime() from both methods, even though the retrieved frame can sometimes mismatch with the display frame.

EDIT 2: Regarding the mismatch between displayed frame and extracted frame, it's actually very simple but it's quite confusing at first if you don't know how the MediaCodec work. I have to reread every fadden's comment to understand the problem better (this is the one that gives me that "ah ha" moment).

In short, the decoder like to consume multiple buffer before it spit out any representation buffer. So the one that is currently displayed is not the same as the one at the current extractor.getSampleTime() position. So the correct value to be synchronized between displaying and extracting should be the presentationTime of the output buffer, something like this:

mCurrentSampleTime = mBufferInfo.presentationTimeUs;

Understanding this help resolve many multiple mysterious questions (such as why the first frame is not at 0 position?). Hope this will help someone.

score 1 · Answer 1 · answered Aug 23 '16 at 09:44

Not the specific answer to my question but I do find a way to improve the frame extraction time. Basically, if you don't have any strict requirement for PNG for format then just compress the output image as jpeg like this:

outputBitmap.compress(Bitmap.CompressFormat.JPEG, 100, bos);

This will use hardware acceleration instead of pure software compression like in PNG and it is significantly faster. I'm getting ~600ms for the whole operation, the compression bit taking about ~200ms. That's a very huge improvement from the previous 5 seconds using PNG compression.

In theory, you could gain even more performance by using Bitmap.Config.RGB_565 for the output image instead of Bitmap.Config.ARGB_8888 if you don't care about transparency. However, in practice I encounter 2 problems that prevent me to do so:

the output image's color get messed up.
it actually takes longer to extract the image.

Can Android MediaCodec decodes to two Surfaces at the same time?

1 Answers1