3

We’re attempting to save video and audio from an Android device into an encrypted file. Our current implementation pipes the outputs from microphone and camera through the MediaEncoder class. As the data is output from MediaEncoder, we are encrypting and writing the contents of the byte buffer to disk. This approach works, however, when attempting to stitch the files back together with FFMPEG, we notice that the two streams seem to get out of sync somewhere mid stream. It appears that a lot of important metadata is lost with this method, specifically presentation timestamps and frame rate data as ffmpeg has to do some guess work to mux the files.

Are there techniques for keeping these streams in sync without using MediaMuxer? The video is encoded with H.264 and the audio with AAC.

Other Approaches: We attempted to use the MediaMuxer to mux the output data to a file, but our use case requires that we encrypt the bytes of data before they are saved to disk which eliminates the possibility of using the default constructor.

Additionally, we have attempted to use the newly added (API 26) constructor that takes a FileDescriptor instead and have that pointed to a ParcelFileDescriptor that wrapped an Encrypted Document (https://android.googlesource.com/platform/development/+/master/samples/Vault/src/com/example/android/vault/EncryptedDocument.java). However, this approach led to crashes at the native layer and we believe it may have to do with this comment from the source code (https://android.googlesource.com/platform/frameworks/base.git/+/master/media/java/android/media/MediaMuxer.java#353) about the native writer trying to memory map the output file.

import android.graphics.YuvImage
import android.media.MediaCodec
import android.media.MediaCodecInfo
import android.media.MediaFormat
import android.media.MediaMuxer
import com.callyo.video_10_21.Utils.YuvImageUtils.convertNV21toYUV420Planar
import java.io.FileDescriptor
import java.util.*
import java.util.concurrent.atomic.AtomicReference
import kotlin.properties.Delegates

class VideoEncoderProcessor(
   private val fileDescriptor: FileDescriptor,
   private val width: Int,
   private val height: Int,
   private val frameRate: Int
): MediaCodec.Callback() {
   private lateinit var videoFormat: MediaFormat
   private var trackIndex by Delegates.notNull<Int>()
   private var mediaMuxer: MediaMuxer
   private val mediaCodec = createEncoder()
   private val pendingVideoEncoderInputBufferIndices = AtomicReference<LinkedList<Int>>(LinkedList())

   companion object {
       private const val VIDEO_FORMAT = "video/avc"
   }

  init {
       mediaMuxer = MediaMuxer(fileDescriptor, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)
       mediaCodec.setCallback(this)
       mediaCodec.start()
   }

   private fun createEncoder(): MediaCodec {
       videoFormat = MediaFormat.createVideoFormat(VIDEO_FORMAT, width, height).apply {
           setInteger(MediaFormat.KEY_FRAME_RATE, frameRate)
           setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420Flexible)
           setInteger(MediaFormat.KEY_BIT_RATE, width * height * 5)
           setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 1)
       }

       return MediaCodec.createEncoderByType(VIDEO_FORMAT).apply {
           configure(videoFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
       }
   }

   override fun onInputBufferAvailable(codec: MediaCodec, index: Int) {
       // logic for handling stream end omitted for clarity

       /* Video frames come in asynchronously from input buffer availability
        * so we need to keep track of available buffers in queue */
       pendingVideoEncoderInputBufferIndices.get().add(index)
   }

   override fun onError(codec: MediaCodec, e: MediaCodec.CodecException) {}

   override fun onOutputFormatChanged(codec: MediaCodec, format: MediaFormat) {
       trackIndex = mediaMuxer.addTrack(format)
       mediaMuxer.start()
   }

   override fun onOutputBufferAvailable(codec: MediaCodec, index: Int, bufferInfo: MediaCodec.BufferInfo) {
       val buffer = mediaCodec.getOutputBuffer(index)
       buffer?.apply {
           if (bufferInfo.size != 0) {
               limit(bufferInfo.offset + bufferInfo.size)
               rewind()
               mediaMuxer.writeSampleData(trackIndex, this, bufferInfo)
           }
       }

       mediaCodec.releaseOutputBuffer(index, false)

       if (bufferInfo.flags == MediaCodec.BUFFER_FLAG_END_OF_STREAM) {
           mediaCodec.stop()
           mediaCodec.release()
           mediaMuxer.stop()
           mediaMuxer.release()
       }
   }

   // Public method that receives raw unencoded video data
   fun encode(yuvImage: YuvImage) {
       // logic for handling stream end omitted for clarity

       pendingVideoEncoderInputBufferIndices.get().poll()?.let { index ->
           val buffer = mediaCodec.getInputBuffer(index)
           buffer?.clear()
           // converting frame to correct color format
           val input =
                   yuvImage.convertNV21toYUV420Planar(ByteArray(yuvImage.yuvData.size), yuvImage.width, yuvImage.height)
           buffer?.put(input)
           buffer?.let {
               mediaCodec.queueInputBuffer(index, 0, input.size, System.nanoTime() / 1000, 0)
           }
       }
   }
}



Additional Info: I’m using MediaCodec.Callback() (https://developer.android.com/reference/kotlin/android/media/MediaCodec.Callback?hl=en) to handle the encoding asynchronously.

Robert
  • 981
  • 1
  • 15
  • 24
  • Do you want to encrypt the `video` and `audio` files first and then mux them together? You should do the opposite. – Darkman Feb 04 '21 at 08:30
  • The android muxer API allows you to specify where you want the output to be saved on disk, but it's important for my use case to be able to encrypt the data in-memory. @Darkman – Robert Feb 04 '21 at 21:53
  • It's likely hopeless without timestamps. Try using a ramdisk to make a fake disk location to write to, then you might be able to have your cake and eat it too! – Mark H Feb 08 '21 at 15:16

1 Answers1

3

Introduction

I'm going to reference the following Q/A: sync audio and video with mediacodec and mediamuxer

Since the information is lost:

in order to sync audio and video you have to "calculate the number of audio samples that should play for each frame of video"

The author continued and provided an example, e.g.

It depends on the sample rate and the frame rate:

at 24fps and 48000Hz every frame is long (48000hz/24fps)= 2000 sample

at 25 fps and 48000Hz: (48000hz/25fps)= 1920 sample

Examples

Have a look at the following example that muxes a video and audio file, where they set the samples sizes and combine video and audio (from: https://github.com/Docile-Alligator/Infinity-For-Reddit/blob/61c5682b06fb3739a9f980700e6602ae0f39d5a2/app/src/main/java/ml/docilealligator/infinityforreddit/services/DownloadRedditVideoService.java#L506 )

private boolean muxVideoAndAudio(String videoFilePath, String audioFilePath, String outputFilePath) {
    try {
        File file = new File(outputFilePath);
        file.createNewFile();
        MediaExtractor videoExtractor = new MediaExtractor();
        videoExtractor.setDataSource(videoFilePath);
        MediaExtractor audioExtractor = new MediaExtractor();
        audioExtractor.setDataSource(audioFilePath);
        MediaMuxer muxer = new MediaMuxer(outputFilePath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4);

        videoExtractor.selectTrack(0);
        MediaFormat videoFormat = videoExtractor.getTrackFormat(0);
        int videoTrack = muxer.addTrack(videoFormat);

        audioExtractor.selectTrack(0);
        MediaFormat audioFormat = audioExtractor.getTrackFormat(0);
        int audioTrack = muxer.addTrack(audioFormat);
        boolean sawEOS = false;
        int offset = 100;
        int sampleSize = 2048 * 1024;
        ByteBuffer videoBuf = ByteBuffer.allocate(sampleSize);
        ByteBuffer audioBuf = ByteBuffer.allocate(sampleSize);
        MediaCodec.BufferInfo videoBufferInfo = new MediaCodec.BufferInfo();
        MediaCodec.BufferInfo audioBufferInfo = new MediaCodec.BufferInfo();

        videoExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC);
        audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC);

        muxer.start();

        while (!sawEOS) {
            videoBufferInfo.offset = offset;
            videoBufferInfo.size = videoExtractor.readSampleData(videoBuf, offset);

            if (videoBufferInfo.size < 0 || audioBufferInfo.size < 0) {
                sawEOS = true;
                videoBufferInfo.size = 0;
            } else {
                videoBufferInfo.presentationTimeUs = videoExtractor.getSampleTime();
                videoBufferInfo.flags = videoExtractor.getSampleFlags();
                muxer.writeSampleData(videoTrack, videoBuf, videoBufferInfo);
                videoExtractor.advance();
            }
        }

        boolean sawEOS2 = false;
        while (!sawEOS2) {
            audioBufferInfo.offset = offset;
            audioBufferInfo.size = audioExtractor.readSampleData(audioBuf, offset);

            if (videoBufferInfo.size < 0 || audioBufferInfo.size < 0) {
                sawEOS2 = true;
                audioBufferInfo.size = 0;
            } else {
                audioBufferInfo.presentationTimeUs = audioExtractor.getSampleTime();
                audioBufferInfo.flags = audioExtractor.getSampleFlags();
                muxer.writeSampleData(audioTrack, audioBuf, audioBufferInfo);
                audioExtractor.advance();

            }
        }

        try {
            muxer.stop();
            muxer.release();
        } catch (IllegalStateException ignore) {}
    } catch (IOException e) {
        e.printStackTrace();
        return false;
    }

    return true;
}

Have a look @ the following page: https://sisik.eu/blog/android/media/mix-audio-into-video

From there, they have a wonderful example in the section: Muxing Frames Into MP4 With MediaMuxer that you could use to stitch the file back together.

From there:

In my case, I want to get input from MPEG-4 video and from AAC/M4A audio file, > and mux both inputs into one MPEG-4 output video file. To accomplish that, I created the following mux() method

fun mux(audioFile: String, videoFile: String, outFile: String) {

    // Init extractors which will get encoded frames
    val videoExtractor = MediaExtractor()
    videoExtractor.setDataSource(videoFile)
    videoExtractor.selectTrack(0) // Assuming only one track per file. Adjust code if this is not the case.
    val videoFormat = videoExtractor.getTrackFormat(0)

    val audioExtractor = MediaExtractor()
    audioExtractor.setDataSource(audioFile)
    audioExtractor.selectTrack(0) // Assuming only one track per file. Adjust code if this is not the case.
    val audioFormat = audioExtractor.getTrackFormat(0)

    // Init muxer
    val muxer = MediaMuxer(outFile, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)
    val videoIndex = muxer.addTrack(videoFormat)
    val audioIndex = muxer.addTrack(audioFormat)
    muxer.start()

    // Prepare buffer for copying
    val maxChunkSize = 1024 * 1024
    val buffer = ByteBuffer.allocate(maxChunkSize)
    val bufferInfo = MediaCodec.BufferInfo()

    // Copy Video
    while (true) {
        val chunkSize = videoExtractor.readSampleData(buffer, 0)

        if (chunkSize > 0) {
            bufferInfo.presentationTimeUs = videoExtractor.sampleTime
            bufferInfo.flags = videoExtractor.sampleFlags
            bufferInfo.size = chunkSize

            muxer.writeSampleData(videoIndex, buffer, bufferInfo)

            videoExtractor.advance()

        } else {
            break
        }
    }

    // Copy audio
    while (true) {
        val chunkSize = audioExtractor.readSampleData(buffer, 0)

        if (chunkSize >= 0) {
            bufferInfo.presentationTimeUs = audioExtractor.sampleTime
            bufferInfo.flags = audioExtractor.sampleFlags
            bufferInfo.size = chunkSize

            muxer.writeSampleData(audioIndex, buffer, bufferInfo)
            audioExtractor.advance()
        } else {
            break
        }
    }

    // Cleanup
    muxer.stop()
    muxer.release()

    videoExtractor.release()
    audioExtractor.release()
}

Update

Based on your comments, I think the main issue is the fileDescriptor. Specifically, they are only using the RandomAccessFile for the file descriptor but the native interface is the one doing the reading.

I have a suggestion then, that maybe you should considering using a FileDescriptor that is in-memory and not based on a file.

So, read the encrypted file and decrypt it in memory, then convert those bytes into a new in-memory fileDescriptor. Feed that in-memory fileDescriptor to the muxor and see what happens.

There's a great answer about this where they use secure private app-only sockets to create a file descriptor, see: Create an in-memory FileDescriptor

Check specifically the second part of that answer starting from:

A better, but more complicated solution is to create a socket in the filesystem namespace. Reference: https://stackoverflow.com/a/62651005/1688441

So in more detail:

  1. Read the encrypted file and decrypt it into the bytes but keep in memory
  2. Create a localSocket in your app's private data area, and a Server.
  3. Start listening on your server and accepting the unencrypted bytes.
  4. Create a localSocket Client and pump the unencrypted bytes to the server.
  5. Also pass the fileDescriptor of the client to the muxor.

As the answer states:

This does create a file on the filesystem, but none of the data that passes through the socket is ever written to disk, it is entirely in-memory. The file is just a name that represents the socket, similar to the files in /dev that represent devices. Because the socket is accessed through the filesystem, it is subject to the usual filesystem permissions, so it is easy to restrict access to the socket by placing the socket in your app's private data area.

Since this technique creates a file on the filesystem, it would be a good idea to delete the file after you're done, and also perhaps to check for and clean up old sockets every so often, in case your app crashes and leaves old files laying around.

Menelaos
  • 23,508
  • 18
  • 90
  • 155
  • Hey @Menelaos thank you for providing an answer. This approach works well, however, it doesn't satisfy our constraint of not writing plaintext to disk. We need an approach that can either mux in-memory or add the meta-data necessary for ffmpeg to mux server-side. – Robert Feb 08 '21 at 17:55
  • hi @Robert, from what I understood you are reading 2 encrypted files (audio+video), decrypting and muxing right? You want to mux in memory... but what do you want to do with the result? I guess the problem in the one example is this right, and the fact that we have an output file? `MediaMuxer(outFile, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)` ? – Menelaos Feb 08 '21 at 20:50
  • @Robert yeh I see your problem. I think they are doing the reading on the native level. I saw they are only using the `RandomAccessFile` for the file descriptor. So the problem is actually with the `nativeSetup(fd, format)` possibly. Thinking about this... – Menelaos Feb 09 '21 at 09:33