Only First Track Playing of AVMutableComposition()

Question

New Edit Below

I have already referenced

AVMutableComposition - Only Playing First Track (Swift)

but it is not providing the answer to what I am looking for.

I have a AVMutableComposition(). I am trying to apply MULTIPLE AVCompositionTrack, of a single type AVMediaTypeVideo in this single composition. This is because I am using 2 different AVMediaTypeVideo sources with different CGSize's and preferredTransforms of the AVAsset's they come from.

So, the only way to apply their specified preferredTransforms is to provide them in 2 different tracks. But, for whatever reason, only the first track will actually provide any video, almost as if the second track is never there.

So, I have tried

1) using AVMutableVideoCompositionLayerInstruction's and applying an AVVideoComposition along with an AVAssetExportSession, which works okay, I am still working on the transforms, but is do-able. But the processing time's of the video's are WELL OVER 1 minute, which is just inapplicable in my situation.

2) Using multiple tracks, without AVAssetExportSession and the 2nd track of the same type never appears. Now, I could put it all on 1 track, but all the videos will then be the same size and preferredTransform as the first video, which I absolutely do not want, as it stretches them on all sides.

So my question is, is it possible

1) Applying instructions to just a track WITHOUT using AVAssetExportSession? //Preferred way BY FAR.

2) Decrease time of export? (I have tried using PresetPassthrough but you cannot use that if you have a exporter.videoComposition which are where my instructions are. This is the only place I know I can put instructions, not sure if I can place them somewhere else.

Here is some of my code (without the exporter as I don't need to export anything anywhere, just do stuff after the AVMutableComposition combines the items.

func merge() {
    if let firstAsset = controller.firstAsset, secondAsset = self.asset {

        let mixComposition = AVMutableComposition()

        let firstTrack = mixComposition.addMutableTrackWithMediaType(AVMediaTypeVideo,
                                                                     preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
        do {
            //Don't need now according to not being able to edit first 14seconds.

            if(CMTimeGetSeconds(startTime) == 0) {
                self.startTime = CMTime(seconds: 1/600, preferredTimescale: Int32(600))
            }
            try firstTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600)),
                                           ofTrack: firstAsset.tracksWithMediaType(AVMediaTypeVideo)[0],
                                           atTime: kCMTimeZero)
        } catch _ {
            print("Failed to load first track")
        }


        //This secondTrack never appears, doesn't matter what is inside of here, like it is blank space in the video from startTime to endTime (rangeTime of secondTrack)
        let secondTrack = mixComposition.addMutableTrackWithMediaType(AVMediaTypeVideo,
                                                                     preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
//            secondTrack.preferredTransform = self.asset.preferredTransform
        do {
            try secondTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, secondAsset.duration),
                                           ofTrack: secondAsset.tracksWithMediaType(AVMediaTypeVideo)[0],
                                           atTime: CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600))
        } catch _ {
            print("Failed to load second track")
        }

        //This part appears again, at endTime which is right after the 2nd track is suppose to end.
        do {
            try firstTrack.insertTimeRange(CMTimeRangeMake(CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600), firstAsset.duration-endTime),
                                           ofTrack: firstAsset.tracksWithMediaType(AVMediaTypeVideo)[0] ,
                                           atTime: CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600))
        } catch _ {
            print("failed")
        }
        if let loadedAudioAsset = controller.audioAsset {
            let audioTrack = mixComposition.addMutableTrackWithMediaType(AVMediaTypeAudio, preferredTrackID: 0)
            do {
                try audioTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, firstAsset.duration),
                                               ofTrack: loadedAudioAsset.tracksWithMediaType(AVMediaTypeAudio)[0] ,
                                               atTime: kCMTimeZero)
            } catch _ {
                print("Failed to load Audio track")
            }
        }
    }
}

Edit

Apple states that "Indicates instructions for video composition via an NSArray of instances of classes implementing the AVVideoCompositionInstruction protocol. For the first instruction in the array, timeRange.start must be less than or equal to the earliest time for which playback or other processing will be attempted (note that this will typically be kCMTimeZero). For subsequent instructions, timeRange.start must be equal to the prior instruction's end time. The end time of the last instruction must be greater than or equal to the latest time for which playback or other processing will be attempted (note that this will often be the duration of the asset with which the instance of AVVideoComposition is associated)."

This just states that the entire composition must be layered inside instructions if you decide to use ANY instructions (this is what I am understanding). Why is this? How would I just apply instructions to say track 2 on this example without applying changing track 1 or 3 at all:

Track 1 from 0 - 10sec, Track 2 from 10 - 20sec, Track 3 from 20 - 30sec.

Any explanation on that would probably answer my question (if it is doable).

When you say _the second track is never there_ do you mean you see the composition's background instead or the palyback stops right after the first track? — Max Pevsner, Sep 08 '16 at 11:51
I mean the first track plays, it goes BLANK, and when the 2nd track is done, it goes back to the first track — impression7vx, Sep 08 '16 at 14:09
What transform do you apply to the second track? Maybe it's just located outside the frame of the videoComposition. — Max Pevsner, Sep 08 '16 at 14:23
Well as it stands, if I just combine 2 tracks into an `AVMutableComposition()` then it doesn't even work. The code above just cuts out the 2nd track, as if it is not allowed to have 2 `AVMediaTypeVideo` tracks, make sense? In the code above I am not performing any transforms — impression7vx, Sep 08 '16 at 14:28
In the question you wrote you use two different tracks with different sizes and different prefferedTransforms, so I was wondering how do you deal with the differences. — Max Pevsner, Sep 08 '16 at 14:31
Yea, still working on that. I have 2 ways of going about this. The above code (preferred way), uses `preferredTransforms` but the 2nd track is never showing. So, I can't use different `preferredTransforms` because the 2nd track never shows. Now, I can use `AVAssetExportSession` (I think, still working on it), but it takes about 60 seconds to merge everything. — impression7vx, Sep 08 '16 at 15:04
Are you using the simulator to work with this? I'm not sure if this helps but the simulator doesn't work with AVExportSession. — zsteed, Nov 14 '16 at 16:10
I actually have figured a partial answer. The project is on a slow turn right now, but when it is all completed, I will post as an answer. — impression7vx, Nov 14 '16 at 16:38

score 5 · Accepted Answer · answered May 05 '17 at 06:05

Ok, so for my exact problem, I had to apply specific transforms CGAffineTransform in Swift to get the specific result we wanted. The current one I am posting works with any picture taken/obtained as well as video

//This method gets the orientation of the current transform. This method is used below to determine the orientation
func orientationFromTransform(_ transform: CGAffineTransform) -> (orientation: UIImageOrientation, isPortrait: Bool) {
    var assetOrientation = UIImageOrientation.up
    var isPortrait = false
    if transform.a == 0 && transform.b == 1.0 && transform.c == -1.0 && transform.d == 0 {
        assetOrientation = .right
        isPortrait = true
    } else if transform.a == 0 && transform.b == -1.0 && transform.c == 1.0 && transform.d == 0 {
        assetOrientation = .left
        isPortrait = true
    } else if transform.a == 1.0 && transform.b == 0 && transform.c == 0 && transform.d == 1.0 {
        assetOrientation = .up
    } else if transform.a == -1.0 && transform.b == 0 && transform.c == 0 && transform.d == -1.0 {
        assetOrientation = .down
    }

    //Returns the orientation as a variable
    return (assetOrientation, isPortrait)
}

//Method that lays out the instructions for each track I am editing and does the transformation on each individual track to get it lined up properly
func videoCompositionInstructionForTrack(_ track: AVCompositionTrack, _ asset: AVAsset) -> AVMutableVideoCompositionLayerInstruction {

    //This method Returns set of instructions from the initial track

    //Create inital instruction
    let instruction = AVMutableVideoCompositionLayerInstruction(assetTrack: track)

    //This is whatever asset you are about to apply instructions to.
    let assetTrack = asset.tracks(withMediaType: AVMediaTypeVideo)[0]

    //Get the original transform of the asset
    var transform = assetTrack.preferredTransform

    //Get the orientation of the asset and determine if it is in portrait or landscape - I forget which, but either if you take a picture or get in the camera roll it is ALWAYS determined as landscape at first, I don't recall which one. This method accounts for it.
    let assetInfo = orientationFromTransform(transform)

    //You need a little background to understand this part. 
    /* MyAsset is my original video. I need to combine a lot of other segments, according to the user, into this original video. So I have to make all the other videos fit this size. 
      This is the width and height ratios from the original video divided by the new asset 
    */
    let width = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width/assetTrack.naturalSize.width
    var height = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height

    //If it is in portrait
    if assetInfo.isPortrait {

        //We actually change the height variable to divide by the width of the old asset instead of the height. This is because of the flip since we determined it is portrait and not landscape. 
        height = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.width

        //We apply the transform and scale the image appropriately.
        transform = transform.scaledBy(x: height, y: height)

        //We also have to move the image or video appropriately. Since we scaled it, it could be wayy off on the side, outside the bounds of the viewing.
        let movement = ((1/height)*assetTrack.naturalSize.height)-assetTrack.naturalSize.height

        //This lines it up dead center on the left side of the screen perfectly. Now we want to center it.
        transform = transform.translatedBy(x: 0, y: movement)

        //This calculates how much black there is. Cut it in half and there you go!
        let totalBlackDistance = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width-transform.tx
        transform = transform.translatedBy(x: 0, y: -(totalBlackDistance/2)*(1/height))

    } else {

        //Landscape! We don't need to change the variables, it is all defaulted that way (iOS prefers landscape items), so we scale it appropriately.
        transform = transform.scaledBy(x: width, y: height)

        //This is a little complicated haha. So because it is in landscape, the asset fits the height correctly, for me anyway; It was just extra long. Think of this as a ratio. I forgot exactly how I thought this through, but the end product looked like: Answer = ((Original height/current asset height)*(current asset width))/(Original width)
        let scale:CGFloat = ((MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height)*(assetTrack.naturalSize.width))/MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width
        transform = transform.scaledBy(x: scale, y: 1)

        //The asset can be way off the screen again, so we have to move it back. This time we can have it dead center in the middle, because it wasn't backwards because it wasn't flipped because it was landscape. Again, another long complicated algorithm I derived.
        let movement = ((MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width-((MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height)*(assetTrack.naturalSize.width)))/2)*(1/MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height)
        transform = transform.translatedBy(x: movement, y: 0)
    }

    //This creates the instruction and returns it so we can apply it to each individual track.
    instruction.setTransform(transform, at: kCMTimeZero)
    return instruction
}

Now that we have those methods, we can now apply the correct and appropriate transformations to our assets appropriately and get everything fitting nice and clean.

func merge() {
if let firstAsset = MyAsset, let newAsset = newAsset {

        //This creates our overall composition, our new video framework
        let mixComposition = AVMutableComposition()

        //One by one you create tracks (could use loop, but I just had 3 cases)
        let firstTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeVideo,
                                                                     preferredTrackID: Int32(kCMPersistentTrackID_Invalid))

        //You have to use a try, so need a do
        do {

            //Inserting a timerange into a track. I already calculated my time, I call it startTime. This is where you would put your time. The preferredTimeScale doesn't have to be 600000 haha, I was playing with those numbers. It just allows precision. At is not where it begins within this individual track, but where it starts as a whole. As you notice below my At times are different You also need to give it which track 
            try firstTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600000)),
                                           of: firstAsset.tracks(withMediaType: AVMediaTypeVideo)[0],
                                           at: kCMTimeZero)
        } catch _ {
            print("Failed to load first track")
        }

        //Create the 2nd track
        let secondTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeVideo,
                                                                      preferredTrackID: Int32(kCMPersistentTrackID_Invalid))

        do {

            //Apply the 2nd timeRange you have. Also apply the correct track you want
            try secondTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, self.endTime-self.startTime),
                                           of: newAsset.tracks(withMediaType: AVMediaTypeVideo)[0],
                                           at: CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600000))
            secondTrack.preferredTransform = newAsset.preferredTransform
        } catch _ {
            print("Failed to load second track")
        }

        //We are not sure we are going to use the third track in my case, because they can edit to the end of the original video, causing us not to use a third track. But if we do, it is the same as the others!
        var thirdTrack:AVMutableCompositionTrack!
        if(self.endTime != controller.realDuration) {
            thirdTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeVideo,
                                                                      preferredTrackID: Int32(kCMPersistentTrackID_Invalid))

        //This part appears again, at endTime which is right after the 2nd track is suppose to end.
            do {
                try thirdTrack.insertTimeRange(CMTimeRangeMake(CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600000), self.controller.realDuration-endTime),
                                           of: firstAsset.tracks(withMediaType: AVMediaTypeVideo)[0] ,
                                           at: CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600000))
            } catch _ {
                print("failed")
            }
        }

        //Same thing with audio!
        if let loadedAudioAsset = controller.audioAsset {
            let audioTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeAudio, preferredTrackID: 0)
            do {
                try audioTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, self.controller.realDuration),
                                               of: loadedAudioAsset.tracks(withMediaType: AVMediaTypeAudio)[0] ,
                                               at: kCMTimeZero)
            } catch _ {
                print("Failed to load Audio track")
            }
        }

        //So, now that we have all of these tracks we need to apply those instructions! If we don't, then they could be different sizes. Say my newAsset is 720x1080 and MyAsset is 1440x900 (These are just examples haha), then it would look a tad funky and possibly not show our new asset at all.
        let mainInstruction = AVMutableVideoCompositionInstruction()

        //Make sure the overall time range matches that of the individual tracks, if not, it could cause errors. 
        mainInstruction.timeRange = CMTimeRangeMake(kCMTimeZero, self.controller.realDuration)

        //For each track we made, we need an instruction. Could set loop or do individually as such.
        let firstInstruction = videoCompositionInstructionForTrack(firstTrack, firstAsset)
        //You know, not 100% why this is here. This is 1 thing I did not look into well enough or understand enough to describe to you. 
        firstInstruction.setOpacity(0.0, at: startTime)

        //Next Instruction
        let secondInstruction = videoCompositionInstructionForTrack(secondTrack, self.asset)

        //Again, not sure we need 3rd one, but if we do.
        var thirdInstruction:AVMutableVideoCompositionLayerInstruction!
        if(self.endTime != self.controller.realDuration) {
            secondInstruction.setOpacity(0.0, at: endTime)
            thirdInstruction = videoCompositionInstructionForTrack(thirdTrack, firstAsset)
        }

        //Okay, now that we have all these instructions, we tie them into the main instruction we created above.
        mainInstruction.layerInstructions = [firstInstruction, secondInstruction]
        if(self.endTime != self.controller.realDuration) {
            mainInstruction.layerInstructions += [thirdInstruction]
        }

        //We create a video framework now, slightly different than the one above.
        let mainComposition = AVMutableVideoComposition()

        //We apply these instructions to the framework
        mainComposition.instructions = [mainInstruction]

        //How long are our frames, you can change this as necessary
        mainComposition.frameDuration = CMTimeMake(1, 30)

        //This is your render size of the video. 720p, 1080p etc. You set it!
        mainComposition.renderSize = firstAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize

        //We create an export session (you can't use PresetPassthrough because we are manipulating the transforms of the videos and the quality, so I just set it to highest)
        guard let exporter = AVAssetExportSession(asset: mixComposition, presetName: AVAssetExportPresetHighestQuality) else { return }

        //Provide type of file, provide the url location you want exported to (I don't have mine posted in this example).
        exporter.outputFileType = AVFileTypeMPEG4
        exporter.outputURL = url

        //Then we tell the exporter to export the video according to our video framework, and it does the work!
        exporter.videoComposition = mainComposition

        //Asynchronous methods FTW!
        exporter.exportAsynchronously(completionHandler: {
            //Do whatever when it finishes!
        })
    }
}

There is a lot going on here, but it has to be done, for my example anyways! Sorry it took so long to post and let me know if you have questions.

when transform of video is like (a = -1, b = 0, c = 0 ,d = 1) at that time video not appearing..how can handle it..any idea? — Rakesh Patel, May 02 '18 at 06:48
I keep seeing this `orientationFromTransform()` code on an on in SO answers, blog posts, etc. apparently copy-pasted, and keep wondering: Why does it need to return a **tuple** with a separate `isPortrait` boolean, knowing that it will **always** be true for `.left` and `.right` and **always** false for `.up` and `.down`... — Nicolas Miari, Feb 05 '20 at 03:54
It just makes it easier to be able to package the entire orientation, including the portrait. As you stated, `isPortrait` only returns true for `.left` and `.right`; thus, we can return `isPortrait` instead of having to check everytime elsewhere. Let's say you call this method in 3 different locations, you'd have to check `.left` or `.right` each time to determine `isPortrait` whereas this method returns all of it — impression7vx, Feb 05 '20 at 18:13
this is still quite slow when i use it :( it seems passetThrough preset is the only thing making the video exporting faster... — Kev Wats, Oct 05 '20 at 02:44
Yeah, it's not instant and does take time. That is the only preset that does it quick enough - I'd recommend potentially creating your own video creator - this is complex but can be done (I've done it) using `AVCaptureSynchronizedDataCollection` or some variant of `AVCaptureData` to collect your data and capture frame by frame and create a video that way - this way ensures you control all of exactly what happens on a much lower level - it's not only quick but can be done asynchronously as the images rae coming through. It may become tricky if you are also modifying each image. — impression7vx, Oct 07 '20 at 19:40

score 3 · Answer 2 · answered Feb 22 '17 at 11:10

Yes you can totally just apply an individual transform to a each layer of an AVMutableComposition.

Heres an overview of the process - Ive done this personally in Objective-C though so I cant give you the exact swift code, but I know these same functions work just the same in Swift.

Create an AVMutableComposition.
Create an AVMutableVideoComposition.
Set the render size and frame duration of the Video Composition.
Now for each AVAsset :
- Create an AVAssetTrack and an AVAudioTrack.
- Create an AVMutableCompositionTrack for each of those (one for video, one for audio) by adding each to the mutableComposition.

here it gets more complicated .. (sorry AVFoundation is not easy!)

Create an AVMutableCompositionLayerInstruction from the AVAssetTrack that refers to each video. For each AVMutableCompositionLayerInstruction, you can set the transform on it. You can also do things like set a crop rectangle.
Add each AVMutableCompositionLayerInstruction to an array of layerinstructions. When all the AVMutableCompositionLayerInstructions are created, the array gets set on the AVMutableVideoComposition.

And finally ..

And finally, you will have an AVPlayerItem that you will use to play this back (on an AVPlayer). You create the AVPlayerItem using the AVMutableComposition, and then you set the AVMutableVideoComposition on the AVPlayerItem itself (setVideoComposition..)

Easy eh?

It took me some weeks to get this stuff working well. Its totally unforgiving and as you have mentioned, if you do something wrong, it doesnt tell you what you did wrong - it just doesnt appear.

But when you crack it, it totally works quickly and well.

Finally, all the stuff I have outlined is available in the AVFoundation docs. Its a lengthy tome, but you need to know it to achieve what you are trying to do.

Best of luck!

I appreciate your help, already found the answer. Just haven't posted it. Thank you tho! — impression7vx, Feb 23 '17 at 23:01
@impression7vx Any progress on this by any chance? Anything to help the community? Hit a road block with this and haven't found a good answer. Thanks! — simplexity, Apr 27 '17 at 20:50
Yea man. I had surgery yesterday so gimme some time to get home and post some code today or tomorrow. Cool? — impression7vx, Apr 28 '17 at 08:29
Hey Luke! In theory, using this method, would be able to filter (let's say black/white filter) single video from a multiple video composition? Meaning that if we play three videos at the same time (as overlays), we can filter one with X and second with Y etc'? — Roi Mulia, Nov 28 '18 at 00:37
I understand the need for separate composition tracks for video (e.g., different frame sizes), but: Why do you need separate audio tracks in the composition for each source asset? As long as they don't overlap in time, Can't they all fit in one audio track? — Nicolas Miari, Feb 05 '20 at 04:06

Only First Track Playing of AVMutableComposition()

2 Answers2

Linked