I'm working on a project that requires sequencing a large number (problem is visible at n = 30 or fewer) of short (1-5 second) AVAsset
s. All of the reference material and sample projects I can find points to using the range CMTimeRange(start: .zero, end: asset.duration)
for insertion into composition tracks, so:
let audioTrack: AVAssetTrack = ...
let videoTrack: AVAssetTrack = ...
var playhead = CMTime.zero
for asset in assets {
let assetRange = CMTimeRange(start: .zero, end: asset.duration)
let (sourceAudioTrack, sourceVideoTrack) = sourceTracks(from: asset)
try! audioTrack.insertTimeRange(assetRange, of: sourceAudioTrack, at: playhead)
try! videoTrack.insertTimeRange(assetRange, of: sourceVideoTrack, at: playhead)
playhead = playhead + assetRange.duration
}
The problem is that this leads to the audio and video falling out of sync (the video appears to be lagging behind the audio.) Some observations:
- The problem seems to go away or be less severe when I use fewer clips
- The clips don't exhibit this behavior when played back on their own
- Some assets have video and audio tracks whose time ranges differ. I think that this might be because of the priming frame issue discussed here
- Filtering out the assets whose tracks have different lengths doesn't resolve the issue
- The time ranges are all given by the system at a 44100 timescale, so the timescale mismatch / rounding discussed here would seem not to apply
I've tested out a number of different strategies for computing the time range, none of which seem to solve the issue:
enum CompositionStrategy: Int, CaseIterable {
case each // Time range of source video track for video track, audio for audio
case videoTimeRange // Time range of source video track for both
case audioTimeRange // Time range of source audio track for both
case intersection // Intersection of source video and audio time ranges for both
case assetDuration // (start: .zero, end: asset.duration) for both
case trim // Apply audio trim from CoreMedia attachments: https://stackoverflow.com/a/33907747/266711
}
private static func calculateTimeRanges(strategy: CompositionStrategy, audioRange: CMTimeRange, videoRange: CMTimeRange, audioTrimFromStart: CMTime, audioTrimFromEnd: CMTime, assetDuration: CMTime) -> (video: CMTimeRange, audio: CMTimeRange) {
switch strategy {
case .each:
return (video: videoRange, audio: audioRange)
case .audioTimeRange:
return (video: audioRange, audio: audioRange)
case .videoTimeRange:
return (video: videoRange, audio: videoRange)
case .intersection:
let startTime = max(audioRange.start, videoRange.start)
let endTime = min(audioRange.end, videoRange.end)
let range = CMTimeRange(start: startTime, end: endTime)
return (video: range, audio: range)
case .assetDuration:
let range = CMTimeRange(start: .zero, duration: assetDuration)
return (video: range, audio: range)
case .trim:
let audioStart = audioRange.start + audioTrimFromStart
let audioEnd = audioRange.end - audioTrimFromEnd
let trimmedAudio = CMTimeRange(start: audioStart, end: audioEnd)
return (video: videoRange, audio: trimmedAudio)
}
}
(The playhead increment in the earlier snippet gets incremented by the max of whatever's computed for audio and video time ranges in the case they differ)
None of these strategies resolves the issue and I'm about to reach out to Apple for code-level support, but am holding out hope that there's something simple I missed. I also poked around iMovie on the Mac and it's able to line these clips up perfectly with no sync issues, but it doesn't look like it's using an AVComposition
to back its preview player. I would greatly appreciate any help.