0

Im very new to video processing and now I am stuck decoding my H.264 RTSP-Stream with FFmpeg and VideoToolbox in Swift.

Currently I am a bit overwhelmed extracting sps and pps

-> Where are they stored? I have the following options getting data

 - AVFrame.data
 - AVFrame.extended_data
 - AVFrame.metadata
 - AVPacket.data
 - AVPacket.side_data
 - AVCodecContext.extra_data

.. and so on

For now I am working with AVCodecContext.extra_data, but this seems a bit different to the example from here

My code for getting SPS and PPS is this one

private func receiveRawFrame(frame:AVFrame,codecContext:AVCodecContext){
        //Get the extradata, where the SPS and the PPS is stored?
        let codecContextExtraData:UnsafeMutablePointer<UInt8> = codecContext.extradata
        
        let startCodeIndex = 0
        var secondStartCodeIndex = 0
        var thirdStartCodeIndex = 0
        
        var naluType = self.getNaluType(naluTypeRaw: codecContextExtraData[startCodeIndex + 4] & 0x1F)
        
        if naluType == .sps{
            print("Yeah SPS")
            for i in startCodeIndex+4...startCodeIndex + 40{
                if (codecContextExtraData[Int(i)] == 0x00 && codecContextExtraData[Int(i)+1] == 0x00 && codecContextExtraData[Int(i)+2] == 0x00 && codecContextExtraData[Int(i)+3] == 0x01){
                    secondStartCodeIndex = i
                    spsSize = i
                    break
                }
            }
            let secondNaluTypeRaw = (codecContextExtraData[Int(secondStartCodeIndex) + 4] & 0x1F)
            naluType = self.getNaluType(naluTypeRaw: secondNaluTypeRaw)
        }
        if naluType == .pps{
            print("Yeah PPS")
            for i in (spsSize+4)..<(spsSize+30){
                if (codecContextExtraData[Int(i)] == 0x00 && codecContextExtraData[Int(i)+1] == 0x00 && codecContextExtraData[Int(i)+2] == 0x00 && codecContextExtraData[Int(i)+3] == 0x01){
                    print("Never gets here")
                    break
                }
            }
        }
        else{
            print("other -> TBD")
        }
    }
}

Further function to get the naluType:

private func getNaluType(naluTypeRaw:UInt8) -> NaluType {
        switch naluTypeRaw {
        case 0: return .pframe
        case 5: return .iframe
        case 7: return .sps
        case 8: return .pps
        default:
            return .unknown
        }
    }

With this custom enumerator:

enum NaluType {
        case sps
        case pps
        case pframe
        case iframe
        case unknown
    }

As you can see in the comment of the receiveRawFrame function, I never get the third NALU. When I print the AVCodecContext.extraData from [0] to [50] I get the following output

0 0 0 1 103 66 192 30 217 3 197 104 64 0 0 3 0 64 0 0 12 3 197 139 146 0 0 0 1 104 203 140 178 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Now it makes sense, that I never get the third NALU, because there are only 2 StartCodes, but where is the rest?

Chris
  • 227
  • 2
  • 16
  • Did you ever get anywhere on this or did you give up? – Ryan Jul 01 '21 at 02:04
  • Gave up with this way but succeeded with the following tutorial to get my rtsp stream running in swift https://medium.com/liveop-x-team/accelerating-h264-decoding-on-ios-with-ffmpeg-and-videotoolbox-1f000cb6c549 – Chris Jul 02 '21 at 05:33

0 Answers0