Android MediaCodec How to Frame Accurately Trim Audio

Question

I am building the capability to frame-accurately trim video files on Android. Transcoding is implemented with MediaExtractor, MediaCodec, and MediaMuxer. I need help truncating arbitrary Audio frames in order to match their Video frame counterparts.

I believe the Audio frames must be trimmed in the Decoder output buffer, which is the logical place in which uncompressed audio data is available for editing.

For in/out trims I am calculating the necessary offset and size adjustments to the raw Audio buffer to shoehorn it into the available endcap frames, and I am submitting the data with the following code:

MediaCodec.BufferInfo info = pendingAudioDecoderOutputBufferInfos.poll();
...
ByteBuffer decoderOutputBuffer = audioDecoder.getOutputBuffer(decoderIndex).duplicate();
decoderOutputBuffer.position(info.offset);
decoderOutputBuffer.limit(info.offset + info.size);
encoderInputBuffer.position(0);
encoderInputBuffer.put(decoderOutputBuffer);
info.flags |= MediaCodec.BUFFER_FLAG_END_OF_STREAM;
audioEncoder.queueInputBuffer(encoderIndex, info.offset, info.size, presentationTime, info.flags);
audioDecoder.releaseOutputBuffer(decoderIndex, false);

My problem is that the data adjustments appear to affect only the data copied onto the output audio buffer, but not to shorten the audio frame that gets written into the MediaMuxer. The output video either ends up with several milli-seconds of missing audio at the end of the clip, or if I write too much data the audio frame gets dropped completely from the end of the clip.

How to properly trim an Audio Frame?

If I understand correctly, it looks like there's a discrepancy with using `info.offset` when you call `queueInputBuffer`. Doesn't the encoder input buffer run from 0 to `info.size` instead of `info.offset` to (`info.offset` + `info.size`)? Perhaps the time differential you experience is exactly the amount of data between 0 and offset? — Dave, Jun 16 '16 at 23:18
@Dave I believe you are correct. There is no guarantee that `decoderOutputBuffer`, `encoderInputBuffer`, and `audioEncoder.queueInputBuffer` will all use the same `size` and `offset` values. I did try several combinations. I also believe that I am only reducing the size of the data, but not the container. I am starting to consider that the solution may involve something akin to a configuration change with `MediaCodec.BUFFER_FLAG_CODEC_CONFIG`. — David Manpearl, Jun 17 '16 at 00:13

score 4 · Answer 1 · answered Jun 17 '16 at 13:01

There's a few things at play here:

As Dave pointed out, you should pass 0 instead of info.offset to audioEncoder.queueInputBuffer - you already took the offset of the decoder output buffer into account when you set the buffer position with decoderOutputBuffer.position(info.offset);. But perhaps you update it somehow already.
I'm not sure if MediaCodec audio encoders allow you to pass audio data in arbitrary sized chunks, or it you need to send it exactly full audio frames at a time. I think it might accept it though - then you're fine. If not, you need to buffer the audio up yourself and pass it to the encoder once you have a full frame (in case you trimmed out some at the start)
Keep in mind that audio also is frame based (for AAC, it's 1024 samples frames unless you use the low delay variants or HE-AAC), so for 44 kHz, you can have audio duration only with a 23 ms granularity. If you want your audio to end precisely after the right amount of samples, you need to use container signaling to indicate this. I'm not sure if the MediaCodec audio encoder flushes whatever half frame you have at the end, or if you manually need to pass it extra zeros at the end in order to get the last few samples, if you aren't aligned to the frame size. It might not be needed though.
Encoding AAC audio does introduce some delay into the audio stream; after decoding, you'll have a number of priming samples at the start of the decoded stream (the exact number of these depends on the encoder - for the software encoder in Android for AAC-LC, it's probably 2048 samples, but it might also vary). For the case of 2048 samples, it exactly lines up with 2 frames of audio, but it can also be something that isn't a whole number of frames. I don't think MediaCodec signals the exact amount of delay either. If you drop the 2 first output packets from the encoder (in case the delay is 2048 samples), you'll avoid the extra delay, but the actual decoded audio for the first few frames won't be exactly right. (The priming packets are necessary to be able to properly represent whatever samples your stream starts with, otherwise it will more or less converge towards your intended audio within 2048 samples.)

Thank you. You also answered a question of mine earlier this year and provided great assistance to me here: http://stackoverflow.com/a/35885471/376829. I agree with @Dave and you about the offset. — David Manpearl, Jun 17 '16 at 20:32
@mstorsjo @David Manpearl Hi, I use MediaCodec to encode raw PCM data to AAC raw data and to decode it back. I am trying to fix bug `audio does not get processed right away until you input enough data`. e.g a user recorded 'Hello, my name is kidfrom', only 'Hello, my name' will get processed, and 3 minutes later the user recorded 'Where are you from?', again only 'Where are you' will be processed right away. The first thing the other user will hear is 'Hello, my name is', 3 minutes later 'kidfrom, Where are you'. Which is odd. Would you like to help me? — Jason Rich Darmawan, Jan 03 '21 at 18:45
Anyway, I read your 2nd point, I tried to set the `android.media.AudioRecord bufferSizeInBytes to 2048, it's the MediaCodec.BufferInfo.size value` which I assume is the frame size. But it does not fix the bug. — Jason Rich Darmawan, Jan 03 '21 at 18:47

Android MediaCodec How to Frame Accurately Trim Audio

1 Answers1

Linked