Can I send an arbitrary chunk of a WebM (starting at a byte offset) to a mediaSource buffer to be played?

Question

I'm trying to send only a specific truncated portion of a WebM file, starting from an arbitrary keyframe, from a Node.js server to be played back by a client using MediaSource buffering, but I'm not sure if this is possible or how to go about doing it.

So far this is what I'm trying:

find the byte offsets and sizes of init segment + keyframe clusters using mse_json_manifest from https://github.com/acolwell/mse-tools
concat streams for the init segment and a randomly chosen media segment
send the streams through either an HTTP request or a socket event to the client

It looks like sending the init segment always works, as the HTMl5 video player for the client shows the original file's duration, but it doesn't buffer the concatenated media segments sent afterwards.

Here's the relevant server code:

const merge = (...streams: ReadStream[]) => {
    let pass = new PassThrough();
    let waiting = streams.length;
    for (let stream of streams) {
        pass = stream.pipe(pass, {end: false});
        stream.once("end", () => --waiting === 0 && pass.emit("end"));
    }
    return pass;
}

io.on("connection", (socket) => {
    const streams = [
        fs.createReadStream(file, {
            start: audioJson.init.offset,
            end: audioJson.init.size,
        }),
        fs.createReadStream(file, {
            start: audioJson.media[150].offset,
        })
    ];
    merge(...streams).on("data", (data) => socket.emit("audio-data", data));
});

The client:

      const streamVideo = document.getElementById("webmStream");
      const mediaSource = new MediaSource();
      const streamSource = URL.createObjectURL(mediaSource);
      streamVideo.src = streamSource;

      const audioMime = `audio/webm; codecs="opus"`;
      const videoMime = `video/webm; codecs="vp9"`;

      mediaSource.addEventListener("sourceopen", () => {
        const audioBuffer = mediaSource.addSourceBuffer(audioMime);
        const audioChunks = [];

        function appendOrQueueChunk(chunk) {
          if (!audioBuffer.updating && !audioChunks.length) {
            audioBuffer.appendBuffer(chunk);
          } else {
            audioChunks.push(chunk);
          }
        }

        socket.on("audio-data", appendOrQueueChunk);

        audioBuffer.addEventListener("updateend", () => {
          if (audioChunks.length) audioBuffer.appendBuffer(audioChunks.shift());
        });

And a snippet of the JSON:

{
  "type": "audio/webm;codecs=\"opus\"",
  "duration": 93100.000000,
  "init": { "offset": 0, "size": 526},
  "media": [
    { "offset": 526, "size": 10941, "timecode": 0.000000 },
    { "offset": 11467, "size": 10382, "timecode": 0.260000 },
    { "offset": 21849, "size": 10301, "timecode": 0.520000 },
    { "offset": 32150, "size": 10495, "timecode": 0.780000 },
...

The socket streaming works fine as long as I just emit socket events directly from a fs.ReadStream of the entire WebM file so maybe it has something to do with sending streams in sequence but I feel completely out of my depth and think I'm missing something conceptually.

score 1 · Answer 1 · answered Feb 16 '22 at 03:21

You don't even need MediaSource for this. The regular video element can stream from your Node.js server via a simple HTTP request. No need for Socket.IO and what not either.

<video src="https://nodejs-stream-server.example.com/something"></video>

I don't know the library you're using, so I'll tell you how I've done this exact task in the past in more generic terms, and maybe you can adapt it or re-implement.

Firstly, when the request for the media stream comes in to your Node.js server, you must send some initialization data. It sounds like you're already doing this successfully. This initialization data is basically everything in the stream up to the first Cluster element.

So, when your encoder starts, be sure to buffer the data up to then so you have it ready to send to new clients.

Next, you can start at an arbitrary Cluster element as long as that Cluster begins with a keyframe (for video). If this isn't working now, I suspect your Clusters aren't starting with keyframes, or there is something otherwise strange about them. In your JSON, you show an audio stream... was that intentional?

I'd recommend reading up on EBML, which is essentially the base container format for Matroska/WebM. Matroska is just a schema of sorts for the EBML document. WebM is just Matroska, but specified to a core set of codecs.

So, yes, in summary I think you have the concept, but it can be simplified.

Some other things you might find helpful:

https://stackoverflow.com/a/45172617/362536

Can I send an arbitrary chunk of a WebM (starting at a byte offset) to a mediaSource buffer to be played?

1 Answers1