I'm trying to improve the frame extraction of our app. Basically what I've done is combine the solution from Grafika's MoviePlayer
for forward seeking and BigFlake's ExtractMpegFramesTest
to extract the frame. For the extraction, I seek back to the previous key frame, then decode forward and only save the last frame. Something like this (see my previous question for more complete explanation):
decoder.releaseOutputBuffer(decoderStatus, doRender);
if (doRender) {
if (VERBOSE) Log.d(TAG, "awaiting decode of frame " + decodeCount);
outputSurface.awaitNewImage();
outputSurface.drawImage(false);
if(extractor.getSampleTime() == mPosition){
Log.d(TAG, "sampleTime: " + extractor.getSampleTime() + " mPosition: " + mPosition + "----- EXTRACTING FRAME");
long startWhen = System.currentTimeMillis();
outputSurface.saveFrame();
long frameSaveTime = System.currentTimeMillis() - startWhen;
Log.d(TAG, "sampleTime: frame saved in: " + frameSaveTime + " millisecond");
return;
}
decodeCount++;
}
The problem is sometimes the sample time retrieved from extractor.getSampleTime()
when seeking backward then decode forward doesn't seem to match the one from straight forward seeking.
I've included a log to make this clearer:
position is the seeking position in microsecond
sampleTime: 12112100 -- position: 12139000 ----- FORWARD
sampleTime: 12120441 -- position: 12139000 ----- FORWARD
sampleTime: 12128783 -- position: 12139000 ----- FORWARD
sampleTime: 12137125 -- position: 12139000 ----- FORWARD
sampleTime: 12012000 -- position: 12139000 ----- BACKWARD
sampleTime: 12020341 -- position: 12139000 ----- BACKWARD
sampleTime: 12028683 -- position: 12139000 ----- BACKWARD
sampleTime: 12037025 -- position: 12139000 ----- BACKWARD
sampleTime: 12045366 -- position: 12139000 ----- BACKWARD
sampleTime: 12053708 -- position: 12139000 ----- BACKWARD
sampleTime: 12062050 -- position: 12139000 ----- BACKWARD
sampleTime: 12070391 -- position: 12139000 ----- BACKWARD
sampleTime: 12078733 -- position: 12139000 ----- BACKWARD
sampleTime: 12087075 -- position: 12139000 ----- BACKWARD
sampleTime: 12095416 -- position: 12139000 ----- BACKWARD
sampleTime: 12103758 -- position: 12139000 ----- BACKWARD
sampleTime: 12112100 -- position: 12139000 ----- BACKWARD
sampleTime: 12120441 -- position: 12139000 ----- BACKWARD
sampleTime: 12128783 -- position: 12139000 ----- BACKWARD
As you can see, in forward seeking the . I'm not sure why it happens but this results in a mismatch between representation frame and extracted frame. Also this method is not very efficient as I have to set up a extractor.getSampleTime()
can reach to position 12137125
while seeking back then decode forward, it can only reach 12128783
EGLSurface
and decode to it every time I need to extract a frame. Depending on how far the required frame from the previous key frame, this operation can take 3 to 5 seconds, definitely too long for multiple extraction.
I would like to ask is it possible to make the decoder decode to both surfaces (a SurfaceView
for displaying and a EGLSurface
for frame retrieval) at the same time so that I can potentially solve both of these accuracy and performance issues.
I've also tried using FFmpeg to retrieve frame before, the performance is about the same. If there's a better way to retrieve frame than using OpenGL, I'm very willing to try.
EDIT: After further testing, I can match the extractor.getSampleTime()
from both methods, even though the retrieved frame can sometimes mismatch with the display frame.
EDIT 2: Regarding the mismatch between displayed frame and extracted frame, it's actually very simple but it's quite confusing at first if you don't know how the MediaCodec
work. I have to reread every fadden's comment to understand the problem better (this is the one that gives me that "ah ha" moment).
In short, the decoder like to consume multiple buffer before it spit out any representation buffer. So the one that is currently displayed is not the same as the one at the current extractor.getSampleTime()
position. So the correct value to be synchronized between displaying and extracting should be the presentationTime of the output buffer, something like this:
mCurrentSampleTime = mBufferInfo.presentationTimeUs;
Understanding this help resolve many multiple mysterious questions (such as why the first frame is not at 0 position?). Hope this will help someone.