when initiating a RTSP-Session the server normaly starts the RTP-stream with config-data followed by the first I-Frame.
It is thinkable, that your Axis-camera is set to "always multicast" - in this case the RTSP-communication leads to a SDP description which tells the client all necessary network and streaming details for receiving the multicast stream.
Since the multicast stream is always present, you most probably receive some P- or B- frames first (depending on GOP-size).
You can detect these P/B-frames in your RTP client the same way you were detecting the I-frames as suggested by Ralf by identyfieng them via the NAL-unit type. Simply skip all frames in the RTP client until you receive the first I-frame.
Now you can forward all following frames to the decoder.
or you gave to change you camera settings!
jens.
ps: don't forget that you have fragmentation in your RTP stream - that means that beside of the RTP header there are some fragmentation information. Before identifying a frame you have to reassemble it.