Here's how a general algorithm might operate if you were absolutely determined to make this approach work:
- download the first kilobyte of the MP4 file
- make sure the
moov
atom is up front; determine moov
atom's length and make another request to fetch the rest of the atom
- dig through the
moov
atom to find the video trak
atom; dig into that atom to find the following atoms: stsd
, stss
, stco
/co64
, and stsz
- the
stsd
will give you initialization information required to feed into the H.264 video decoder
- the
stss
atom gives you a list of all the sync samples (keyframes); these can be decoded independently and would be ideal for your thumbnailing prospects
- when you know which frames are keyframes, courtesy of the
stss
atom, you can cross reference with the stco
or co64
atoms (a trak
will have one or the other) in order to find the absolute file location, and the stsz
atom, which will tell you exactly how many bytes are in the frame
With all of this information combined, you should be able to download and decode (and thus resize and re-compress for thumbnailing) just the keyframes of an MP4 video.