Several months ago, I wrote this question, regarding buffer starvation on a DirectShow graph.
The starvation issue was solved by implementing a custom allocator that expands in size when starved. However, this merely mitigated the real problem; given enough time, the number of samples held in the graph becomes excessive and the ever expanding pool creates an out-of-memory situation.
Here are some facts I have managed to gather:
The graph is basically transcoding an MPEG2-TS stream to an MP4 file, as well as extracting audio and video data for some realtime DSP processing.
The stream comes as an UDP multicast stream. The stream is carrying 14 different SD programmes.
I am reading the UDP stream using a custom filter derived from the DsNetwork example. Following the aforementioned example, a media sample (with NO timestamps) is created around the UDP received data block (an 8KiB block) and passed to Microsoft's MPEG2 Demultiplexer filter, that is configured to filter the program of interest. (Should I be timestamping the samples?)
The filter that is requiring an expandable allocator is the MPEG2 Demultiplexer, in particular it is required for the samples delivered by the output video pin. The output audio pin works fine with a default allocator, no samples are retained by the audio decoder or the demuxer.
The video samples are being decoded by LAV Video Decoder. Swapping the LAV filter to ffdshow filter has no positive effect - the accumulation is still present. I have found no setting either in LAV or ffdshow (including the sample queue settings) that alleviates the accumulation problem.
The problem is completely related to the quality of the received stream. The more discontinuities detected on the stream (as flagged by the MPEG demuxer output samples), the more samples tend to be accumulated. Incidentally, running in parallel a VLC player consuming the same stream logs the same discontinuities, so they don't seem to be induced by buggy Network code on my part.
The lingering samples are not lost, they are eventually processed by the graph. I wrote some watchdog logic to detect the possibility of lost samples and every sample is eventually properly released and returned to the pool.
The lag is not related to CPU starvation. If I stop delivering samples to the demuxer, the demuxer stops delivering samples to the output pins. I NEED to push new samples into the demuxer for the lingering samples to be properly released and returned to the pool.
I tried removing the clock from the capture graph, as well as from the muxer graphs (bridged by a GDCL bridge filter). This does not fix the problem and can actually block the data flow.
I have no idea if the samples are being held by the demultiplexer or by the video decoder. The truth is that I am completely clueless on how can I debug and hopefully fix this situation, and any pointers or suggestions are more than welcome.
Addendum:
I have some additional information:
- The transcoded video is lagging relative to the audio.
- The lag time is proportional to the amount of lingering samples.
So I think that at some point in the graph processing, the decoded audio and video sample timestamps get out of sync, and probably the muxer endpoint of the graph is blocking the video decoding thread, waiting for the corresponding audio to arrive.
Any hints on how can I detect the offending filter, or perhaps how can I "rebase" the syncing?
Addendum2:
As you can see in the comments on Roman's answer, I had actually found a bug that induced false discontinuities on the stream. By fixing that bug I reduced the number of incidences of the problem, yet I did not fix the root cause!
It turns out that the root of the problem was caused by the Monogram AAC encoder filter (at least the version I managed to get, as it seems the project is not supported anymore).
The encoder computes the output timestamps incrementally, by multiplying the amount of received samples by the sampling frequency of the input. The filter assumes that the data Flow is always continuous and does not even examine the incoming samples for discontinuities!. Fixing it was easy once I identified the problem, but this was indeed the hardest problem I had to debug in all my life as a developer, as all the problems pointed to the MPEG2 demuxer (the timestamps drifted between the encoded output audio and video pins and it was this filter that was running out of pooled samples in the first place), yet, this was caused indirectly by the worker thread of the video output pin being blocked at the end of the graph, by the MPEG4 muxer, that was receiving way out of sync samples between audio and video and was throttling the video input to try to keep things in sync.
Indeed the illusion of the filters being "black boxes" needs to be taken with caution, as the threads flow along the graph, and a problem on a downstream filter may manifest as false problem in an upstream filter.