You can check the NAL unit type. NAL unit type 5 indicates an IDR frame which is an I frame. Inside the 'mdat' the video is stored:
<size><NAL><size><NAL>...<size><NAL>
The lower 5 bits of the first byte of each NAL unit indicates the type.
Skip through types 6,7,8 and 9 until you find type 1 (non IDR frame) or type 5 (IDR frame).
MP4 files should not contain start codes ([00] 00 00 01) or access unit delimiters.
MPEG-2 Transport Streams or *.h264 raw contain start codes ([00] 00 00 01) and access code delimiters.
The size field in MP4 is most of the time 4 bytes but if you want the correct answer you have parse the codec private data (SPS/PPS).
In short H.264 comes in two formats:
Annex-B (MPEG-2 TS, or *.264 raw file):
<[00] 00 00 01> <NAL> <[00] 00 00 01> <NAL> ... <[00] 00 00 01> <NAL>
MP4 (mdat):
<size><NAL><size><NAL>...<size><NAL>
Your file in https://drive.google.com/file/d/1Vwcz8WsTuRLJie8SFzGspizyTc-caGjc/view?usp=sharing has video and audio in the same mdat.
So to get the I-frame detection reliable you have to parse a little more:
this gives you the video start into mdat:
moof[i]->traf[0]->trun[0]->dataOffset
audio starts here => stop parsing video
moof[i]->traf[1]->trun[0]->dataOffset