8

I'm working on a project that requires sequencing a large number (problem is visible at n = 30 or fewer) of short (1-5 second) AVAssets. All of the reference material and sample projects I can find points to using the range CMTimeRange(start: .zero, end: asset.duration) for insertion into composition tracks, so:

let audioTrack: AVAssetTrack = ...
let videoTrack: AVAssetTrack = ...
var playhead = CMTime.zero

for asset in assets {
  let assetRange = CMTimeRange(start: .zero, end: asset.duration)
  let (sourceAudioTrack, sourceVideoTrack) = sourceTracks(from: asset)
  try! audioTrack.insertTimeRange(assetRange, of: sourceAudioTrack, at: playhead)
  try! videoTrack.insertTimeRange(assetRange, of: sourceVideoTrack, at: playhead)
  playhead = playhead + assetRange.duration
}

The problem is that this leads to the audio and video falling out of sync (the video appears to be lagging behind the audio.) Some observations:

  • The problem seems to go away or be less severe when I use fewer clips
  • The clips don't exhibit this behavior when played back on their own
  • Some assets have video and audio tracks whose time ranges differ. I think that this might be because of the priming frame issue discussed here
  • Filtering out the assets whose tracks have different lengths doesn't resolve the issue
  • The time ranges are all given by the system at a 44100 timescale, so the timescale mismatch / rounding discussed here would seem not to apply

I've tested out a number of different strategies for computing the time range, none of which seem to solve the issue:

enum CompositionStrategy: Int, CaseIterable {
    case each   // Time range of source video track for video track, audio for audio
    case videoTimeRange // Time range of source video track for both
    case audioTimeRange // Time range of source audio track for both
    case intersection   // Intersection of source video and audio time ranges for both
    case assetDuration  // (start: .zero, end: asset.duration) for both
    case trim           // Apply audio trim from CoreMedia attachments: https://stackoverflow.com/a/33907747/266711
}

private static func calculateTimeRanges(strategy: CompositionStrategy, audioRange: CMTimeRange, videoRange: CMTimeRange, audioTrimFromStart: CMTime, audioTrimFromEnd: CMTime, assetDuration: CMTime) -> (video: CMTimeRange, audio: CMTimeRange) {
    switch strategy {
    case .each:
        return (video: videoRange, audio: audioRange)
    case .audioTimeRange:
        return (video: audioRange, audio: audioRange)
    case .videoTimeRange:
        return (video: videoRange, audio: videoRange)
    case .intersection:
        let startTime = max(audioRange.start, videoRange.start)
        let endTime = min(audioRange.end, videoRange.end)
        let range = CMTimeRange(start: startTime, end: endTime)
        return (video: range, audio: range)
    case .assetDuration:
        let range = CMTimeRange(start: .zero, duration: assetDuration)
        return (video: range, audio: range)
    case .trim:
        let audioStart = audioRange.start + audioTrimFromStart
        let audioEnd = audioRange.end - audioTrimFromEnd
        let trimmedAudio = CMTimeRange(start: audioStart, end: audioEnd)
        return (video: videoRange, audio: trimmedAudio)
    }
}

(The playhead increment in the earlier snippet gets incremented by the max of whatever's computed for audio and video time ranges in the case they differ)

None of these strategies resolves the issue and I'm about to reach out to Apple for code-level support, but am holding out hope that there's something simple I missed. I also poked around iMovie on the Mac and it's able to line these clips up perfectly with no sync issues, but it doesn't look like it's using an AVComposition to back its preview player. I would greatly appreciate any help.

jefflovejapan
  • 2,047
  • 3
  • 20
  • 34

1 Answers1

0

@Jeff, I don't know if you've checked the Apple documentation, but they mention a delay that might occur of exactly 2112 samples if the silent samples are not removed on the process, so there the suggestion to manually remove on the playback system in two places: - When playback first begins. - When the playback position is moved to another location - for example, the user skips ahead or back to another part of the media and begins playback from that new location

https://developer.apple.com/library/archive/technotes/tn2258/_index.html

If it doesn't help, please give me more details on the technology your using, type of implementation, so I can help you.

Luan Naufal
  • 1,346
  • 9
  • 15
  • 1
    Hey @Luan, this is what I'm doing with the "trim" composition strategy above. Unortunately, it doesn't resolve the issue. There's nothing much more to the implementation than what you see above -- after inserting the ranges into the composition's audio and video tracks, I create an AVPlayerItem using the composition and use it to back an AVPlayerLayer. – jefflovejapan May 21 '20 at 13:36