6

Background

I'm required to merge a video file and an audio file to a single video file, so that:

  1. The output video file will of the same duration as the input video file
  2. The audio in the output file will only be of the input audio file. If it's too short, it will loop to the end (can stop in the end if needed). This means that once the audio has finished playing while the video hasn't , I should play it again and again, till the video ends (concatenation of the audio).

The technical term of this merging operation is called "muxing", as I've read.

As an example, suppose we have an input video of 10 seconds, and an audio file of 4 seconds, the output video would be of 10 seconds (always the same as the input video), and the audio will play 2.5 times (first 2 cover the first 8 seconds, and then 2 seconds out of 4 for the rest) .

The problems

While I have found a solution of how to mux a video and an audio (here), I've come across multiple issues:

  1. I can't figure out how to loop the writing of the audio content when needed. It keeps giving me an error, no matter what I try

  2. The input files must be of specific file formats. Otherwise, it might throw an exception, or (in very rare cases) worse: create a video file that has black content. Even more: Sometimes a '.mkv' file (for example) could be fine, and sometimes it won't be accepted (and both can be played on a video player app).

  3. The current code handles buffers and not real duration. This means that in many cases, I might stop muxing the audio even though I shouldn't, and the output video file will have a shorter audio content , compared to the original, even though the video is long enough.

What I've tried

  • I tried to make the MediaExtractor of the audio to go to its beginning each time it reached the end, by using:

            if (audioBufferInfo.size < 0) {
                Log.d("AppLog", "reached end of audio, looping...")
                audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
                audioBufferInfo.size = audioExtractor.readSampleData(audioBuf, 0)
            }
    
  • For checking the types of the files, I tried using MediaMetadataRetriever and then checking the mime-type. I think the supported ones are available on the docs (here) as those marked with "Encoder". Not sure about this. I also don't know which mime type is of which type that is mentioned there.

  • I also tried to re-initialize all that's related to the audio, but it didn't work either.

Here's my current code for the muxing itself (full sample project available here) :

object VideoAndAudioMuxer {
    //   based on:  https://stackoverflow.com/a/31591485/878126
    @WorkerThread
    fun joinVideoAndAudio(videoFile: File, audioFile: File, outputFile: File): Boolean {
        try {
            //            val videoMediaMetadataRetriever = MediaMetadataRetriever()
            //            videoMediaMetadataRetriever.setDataSource(videoFile.absolutePath)
            //            val videoDurationInMs =
            //                videoMediaMetadataRetriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION).toLong()
            //            val videoMimeType =
            //                videoMediaMetadataRetriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_MIMETYPE)
            //            val audioMediaMetadataRetriever = MediaMetadataRetriever()
            //            audioMediaMetadataRetriever.setDataSource(audioFile.absolutePath)
            //            val audioDurationInMs =
            //                audioMediaMetadataRetriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION).toLong()
            //            val audioMimeType =
            //                audioMediaMetadataRetriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_MIMETYPE)
            //            Log.d(
            //                "AppLog",
            //                "videoDuration:$videoDurationInMs audioDuration:$audioDurationInMs videoMimeType:$videoMimeType audioMimeType:$audioMimeType"
            //            )
            //            videoMediaMetadataRetriever.release()
            //            audioMediaMetadataRetriever.release()
            outputFile.delete()
            outputFile.createNewFile()
            val muxer = MediaMuxer(outputFile.absolutePath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)
            val sampleSize = 256 * 1024
            //video
            val videoExtractor = MediaExtractor()
            videoExtractor.setDataSource(videoFile.absolutePath)
            videoExtractor.selectTrack(0)
            videoExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
            val videoFormat = videoExtractor.getTrackFormat(0)
            val videoTrack = muxer.addTrack(videoFormat)
            val videoBuf = ByteBuffer.allocate(sampleSize)
            val videoBufferInfo = MediaCodec.BufferInfo()
//            Log.d("AppLog", "Video Format $videoFormat")
            //audio
            val audioExtractor = MediaExtractor()
            audioExtractor.setDataSource(audioFile.absolutePath)
            audioExtractor.selectTrack(0)
            audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
            val audioFormat = audioExtractor.getTrackFormat(0)
            val audioTrack = muxer.addTrack(audioFormat)
            val audioBuf = ByteBuffer.allocate(sampleSize)
            val audioBufferInfo = MediaCodec.BufferInfo()
//            Log.d("AppLog", "Audio Format $audioFormat")
            //
            muxer.start()
//            Log.d("AppLog", "muxing video&audio...")
            //            val minimalDurationInMs = Math.min(videoDurationInMs, audioDurationInMs)
            while (true) {
                videoBufferInfo.size = videoExtractor.readSampleData(videoBuf, 0)
                audioBufferInfo.size = audioExtractor.readSampleData(audioBuf, 0)
                if (audioBufferInfo.size < 0) {
                    //                    Log.d("AppLog", "reached end of audio, looping...")
                    //TODO somehow start from beginning of the audio again, for looping till the video ends
                    //                    audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
                    //                    audioBufferInfo.size = audioExtractor.readSampleData(audioBuf, 0)
                }
                if (videoBufferInfo.size < 0 || audioBufferInfo.size < 0) {
//                    Log.d("AppLog", "reached end of video")
                    videoBufferInfo.size = 0
                    audioBufferInfo.size = 0
                    break
                } else {
                    //                    val donePercentage = videoExtractor.sampleTime / minimalDurationInMs / 10L
                    //                    Log.d("AppLog", "$donePercentage")
                    // video muxing
                    videoBufferInfo.presentationTimeUs = videoExtractor.sampleTime
                    videoBufferInfo.flags = videoExtractor.sampleFlags
                    muxer.writeSampleData(videoTrack, videoBuf, videoBufferInfo)
                    videoExtractor.advance()
                    // audio muxing
                    audioBufferInfo.presentationTimeUs = audioExtractor.sampleTime
                    audioBufferInfo.flags = audioExtractor.sampleFlags
                    muxer.writeSampleData(audioTrack, audioBuf, audioBufferInfo)
                    audioExtractor.advance()
                }
            }
            muxer.stop()
            muxer.release()
//            Log.d("AppLog", "success")
            return true
        } catch (e: Exception) {
            e.printStackTrace()
//            Log.d("AppLog", "Error " + e.message)
        }
        return false
    }
}
  • I've also tried to use FFMPEG libary (here and here) , to see how to do it. It worked fine, but it has some possible issues : The library seems to take a lot of space, annoying licensing terms, and for some reason the sample couldn't play the output file that I got to create, unless I remove something in the command that will make the conversion much slower. I would really prefer to use the built in API than to use this library, even though it's a very powerful library... Also, it seems that for some input files, it didn't loop...

The questions

  1. How can I mux the video&audio files so that the audio will loop in case the audio is shorter (in duration) compared to the video?

  2. How can I do it so that the audio will get cut precisely when the video ends (no remainders on either video&audio) ?

  3. How can I check before calling this function, if the current device can handle the given input files and actually mux them ? Is there a way to check during runtime, which are supported for this kind of operation, instead of relying on a list on the docs that might change in the future?

android developer
  • 114,585
  • 152
  • 739
  • 1,270
  • Have you considered to concatenate the audio file to itself, so the total duration will be equal to the video's duration? If the video's duration is 3 times of the audio - concatenate it 2 times and then mux it together. If the duration is 3.5 times - concatenate it 3 times and cut off the differnce. – TDG Feb 23 '19 at 09:58
  • @TDG Yes, I have tried, but I've failed for some reason. That's the idea of what I'm supposed to do. I wrote about it in the "What I've tried". I tried to re-create its instance each time it ends. Didn't work. Also, the timing handling doesn't exist in my code, as I don't understand how should I do it correctly. It's based on buffer instead of time... :( – android developer Feb 23 '19 at 11:21
  • "tried to re-create its instance each time it ends" - do you mean that you've tried to do it in real-time? Because I'm talking about preparing the concatenated audio **before** starting to mux. Sorry if I did misunderstand you. – TDG Feb 23 '19 at 12:59
  • @TDG What do you mean? I wanted to mux the audio multiple times, so that it will be added to the end, till the video ends. In any case, it didn't work, and even if it did, I couldn't know how to make it sync with the video at the right time, so that it will end together with the video (because what I did isn't time based, but buffer based instead). Please, if you have a solution, write it. I've published a full project that you can try out... – android developer Feb 23 '19 at 13:24
  • My basic idea is this - suppose that the video's duration is 60 seconds and the audio's duration is 11 seconds. First concatenate the audio to itself 5 more times (you'll have now a 66 seconds of audio) and then cut the extra 6 seconds, so both audio and video's durations are equal. Now you can mux both streams and they have the same length. I've never tried to mux audio and video before, but it's the first thing I've taught about when I read your question. – TDG Feb 23 '19 at 16:15
  • @TDG That's the idea I'm trying to implement. Again, I failed to do it, for both the looping and the timing matters. If you know how to do it, please write an answer. There is no need that you explain my problem... – android developer Feb 23 '19 at 18:39
  • @TDG Since you didn't understand the problem, I've updated the question. I hope you will understand it now. – android developer Feb 24 '19 at 08:30
  • @androiddeveloper have you got any solution? – Nikunj Paradva Jul 11 '19 at 05:47
  • I got error "E/MPEG4Writer: Unsupported mime 'audio/mpeg'" with your code, any solution? – Nikunj Paradva Jul 11 '19 at 06:05
  • @NikunjParadva In my case, we decide on the file formats, so we chose those that work. – android developer Jul 11 '19 at 09:29
  • I used mp4 for video and mp3 for audio , but still i got this one. have you any solution for that? how to mux it? – Nikunj Paradva Jul 11 '19 at 09:52
  • Use only files that are supported. I think aac instead of mp3. – android developer Jul 11 '19 at 13:53

1 Answers1

2

I hava the same scene.

  • 1: When audioBufferInfo.size < 0, seek to start. But remember, you need accumulate presentationTimeUs.

  • 2: Get video duration, when audio loop to the duration (use presentationTimeUs too), cut.

  • 3: The audio file need to be MediaFormat.MIMETYPE_AUDIO_AMR_NB or MediaFormat.MIMETYPE_AUDIO_AMR_WB or MediaFormat.MIMETYPE_AUDIO_AAC. On my testing machines, it worked fine.

Here is the code:

private fun muxing(musicName: String) {
    val saveFile = File(DirUtils.getPublicMediaPath(), "$saveName.mp4")
    if (saveFile.exists()) {
        saveFile.delete()
        PhotoHelper.sendMediaScannerBroadcast(saveFile)
    }
    try {
        // get the video file duration in microseconds
        val duration = getVideoDuration(mSaveFile!!.absolutePath)

        saveFile.createNewFile()

        val videoExtractor = MediaExtractor()
        videoExtractor.setDataSource(mSaveFile!!.absolutePath)

        val audioExtractor = MediaExtractor()
        val afdd = MucangConfig.getContext().assets.openFd(musicName)
        audioExtractor.setDataSource(afdd.fileDescriptor, afdd.startOffset, afdd.length)

        val muxer = MediaMuxer(saveFile.absolutePath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)

        videoExtractor.selectTrack(0)
        val videoFormat = videoExtractor.getTrackFormat(0)
        val videoTrack = muxer.addTrack(videoFormat)

        audioExtractor.selectTrack(0)
        val audioFormat = audioExtractor.getTrackFormat(0)
        val audioTrack = muxer.addTrack(audioFormat)

        var sawEOS = false
        val offset = 100
        val sampleSize = 1000 * 1024
        val videoBuf = ByteBuffer.allocate(sampleSize)
        val audioBuf = ByteBuffer.allocate(sampleSize)
        val videoBufferInfo = MediaCodec.BufferInfo()
        val audioBufferInfo = MediaCodec.BufferInfo()

        videoExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
        audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)

        muxer.start()

        val frameRate = videoFormat.getInteger(MediaFormat.KEY_FRAME_RATE)
        val videoSampleTime = 1000 * 1000 / frameRate

        while (!sawEOS) {
            videoBufferInfo.offset = offset
            videoBufferInfo.size = videoExtractor.readSampleData(videoBuf, offset)

            if (videoBufferInfo.size < 0) {
                sawEOS = true
                videoBufferInfo.size = 0

            } else {
                videoBufferInfo.presentationTimeUs += videoSampleTime
                videoBufferInfo.flags = videoExtractor.sampleFlags
                muxer.writeSampleData(videoTrack, videoBuf, videoBufferInfo)
                videoExtractor.advance()
            }
        }

        var sawEOS2 = false
        var sampleTime = 0L
        while (!sawEOS2) {

            audioBufferInfo.offset = offset
            audioBufferInfo.size = audioExtractor.readSampleData(audioBuf, offset)

            if (audioBufferInfo.presentationTimeUs >= duration) {
                sawEOS2 = true
                audioBufferInfo.size = 0
            } else {
                if (audioBufferInfo.size < 0) {
                    sampleTime = audioBufferInfo.presentationTimeUs
                    audioExtractor.seekTo(0, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
                    continue
                }
            }
            audioBufferInfo.presentationTimeUs = audioExtractor.sampleTime + sampleTime
            audioBufferInfo.flags = audioExtractor.sampleFlags
            muxer.writeSampleData(audioTrack, audioBuf, audioBufferInfo)
            audioExtractor.advance()
        }

        muxer.stop()
        muxer.release()
        videoExtractor.release()
        audioExtractor.release()
        afdd.close()
    } catch (e: Exception) {
        LogUtils.e(TAG, "Mixer Error:" + e.message)
    }
}
lijia
  • 21
  • 3
  • I think you are missing some functions, such as "getVideoDuration" . Also, I suggest to avoid file-path or File, because on Android Q it got a lot of restrictions... – android developer Sep 19 '19 at 08:36
  • getVideoDuration is just a function to get video file duration, it will used at write audio sample data – lijia Sep 19 '19 at 09:18