I am using the FFmpeg internal APIs to capture audio and video to a file using an isvm muxer.
Prior to writing the file header, the audio ACC stream time_base is set to 1/44100 and the video h264 stream time_base is set to 1/30. Despite these settings, invoking avformat_write_header(oc, options), ffmpeg internally forces the time_base for both streams to 1/10000000. Looking at the internal source for avformat_write_header, it can be seen that lazy initialization of the AVFormatContext is invoked. For both mp4 and ismv, lazy init will invoke mov_init. However, since ismv has mov->mode == MODE_ISM, it overwrites any stream time_base with the value of 1/10000000, as can be seen on line 6230 in mov_init. mp4 on the other hand allows the streams to maintain a timebase consistent with their associated codec configuration.
The logic to only allow a single timebase was added when ISMV support was added to ffmpeg. Does anyone know why this is necessary (except to support mp4split tooling as stated in the code comments)?
I am finding this confusing and problematic as it relates to writing pts (presentation timestamp) values. I'm relatively new to this space, but my understanding is that:
- Timebase is expressed as units per second. This means for ISMV the value pts=1 is 0.1 microseconds.
- The maximum supported pts value in an ISMV is 2^33 or 8589934592. This limits the max pts of about 859 seconds.
Since I am scaling my packet pts before writing them using av_packet_rescale_ts(packet, codec.time_base, stream.time_base) this results in large values of pts. I have read references to allowing pts to rollover at 2^33. Is this the correct way to deal with the ISMV timebase? Is there something else I am missing.
Thanks in advance!