Im very new to video processing and now I am stuck decoding my H.264 RTSP-Stream with FFmpeg and VideoToolbox in Swift.
Currently I am a bit overwhelmed extracting sps and pps
-> Where are they stored? I have the following options getting data
- AVFrame.data
- AVFrame.extended_data
- AVFrame.metadata
- AVPacket.data
- AVPacket.side_data
- AVCodecContext.extra_data
.. and so on
For now I am working with AVCodecContext.extra_data
, but this seems a bit different to the example from here
My code for getting SPS and PPS is this one
private func receiveRawFrame(frame:AVFrame,codecContext:AVCodecContext){
//Get the extradata, where the SPS and the PPS is stored?
let codecContextExtraData:UnsafeMutablePointer<UInt8> = codecContext.extradata
let startCodeIndex = 0
var secondStartCodeIndex = 0
var thirdStartCodeIndex = 0
var naluType = self.getNaluType(naluTypeRaw: codecContextExtraData[startCodeIndex + 4] & 0x1F)
if naluType == .sps{
print("Yeah SPS")
for i in startCodeIndex+4...startCodeIndex + 40{
if (codecContextExtraData[Int(i)] == 0x00 && codecContextExtraData[Int(i)+1] == 0x00 && codecContextExtraData[Int(i)+2] == 0x00 && codecContextExtraData[Int(i)+3] == 0x01){
secondStartCodeIndex = i
spsSize = i
break
}
}
let secondNaluTypeRaw = (codecContextExtraData[Int(secondStartCodeIndex) + 4] & 0x1F)
naluType = self.getNaluType(naluTypeRaw: secondNaluTypeRaw)
}
if naluType == .pps{
print("Yeah PPS")
for i in (spsSize+4)..<(spsSize+30){
if (codecContextExtraData[Int(i)] == 0x00 && codecContextExtraData[Int(i)+1] == 0x00 && codecContextExtraData[Int(i)+2] == 0x00 && codecContextExtraData[Int(i)+3] == 0x01){
print("Never gets here")
break
}
}
}
else{
print("other -> TBD")
}
}
}
Further function to get the naluType:
private func getNaluType(naluTypeRaw:UInt8) -> NaluType {
switch naluTypeRaw {
case 0: return .pframe
case 5: return .iframe
case 7: return .sps
case 8: return .pps
default:
return .unknown
}
}
With this custom enumerator:
enum NaluType {
case sps
case pps
case pframe
case iframe
case unknown
}
As you can see in the comment of the receiveRawFrame
function, I never get the third NALU. When I print the AVCodecContext.extraData
from [0]
to [50]
I get the following output
0 0 0 1 103 66 192 30 217 3 197 104 64 0 0 3 0 64 0 0 12 3 197 139 146 0 0 0 1 104 203 140 178 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Now it makes sense, that I never get the third NALU, because there are only 2 StartCodes, but where is the rest?