I'm encoding a live stream with VP9 via libvpx and want to stream it over to a HTML5 player. I've read the Matroska specification and W3C WebM Byte Stream Format and examined a couple of WebM files generated by the vpxenc tool from libvpx. Everything seems nice, however I could not find any strict rules or guidelines on how to pack the encoded video frames inside the media segment described in the W3C specification.
As far as I understand I have to emit media segments that contain clusters with block elements inside. From what I understand I can use a simple block element for each frame I get from the encoder since it has a single timestamp. But how to organize clusters? For me it makes sense to emit a single cluster for each frame with a single simple block entry to reduce buffering and lag. Is such approach considered normal or are there any drawbacks to doing so and I should buffer for some time interval and then emit a cluster that contains multiple simple block elements covering the buffered time period?
UPDATE
So I implemented the described approach (emitting clusters with single simple block entry) and the video seems to lag a lot so presumably this is not the way to go.