8

I would like to be able to robustly stop a video when the video arrives on some specified frames in order to do oral presentations based on videos made with Blender, Manim...

I'm aware of this question, but the problem is that the video does not stops exactly at the good frame. Sometimes it continues forward for one frame and when I force it to come back to the initial frame we see the video going backward, which is weird. Even worse, if the next frame is completely different (different background...) this will be very visible.

To illustrate my issues, I created a demo project here (just click "next" and see that when the video stops, sometimes it goes backward). The full code is here.

The important part of the code I'm using is:

      var video = VideoFrame({
          id: 'video',
          frameRate: 24,
          callback: function(curr_frame) {
              // Stops the video when arriving on a frames to stop at.
              if (stopFrames.includes(curr_frame)) {
                  console.log("Automatic stop: found stop frame.");
                  pauseMyVideo();
                  // Ensure we are on the proper frame.
                  video.seekTo({frame: curr_frame});
              }
          }
      });

So far, I avoid this issue by stopping one frame before the end and then using seekTo (not sure how sound this is), as demonstrated here. But as you can see, sometimes when going on the next frame it "freezes" a bit: I guess this is when the stop happens right before the seekTo.

PS: if you know a reliable way in JS to know the number of frames of a given video, I'm also interested.

Concerning the idea to cut the video before hand on the desktop, this could be used... but I had bad experience with that in the past, notably as changing videos sometimes produce some glitches. Also, it can be more complicated to use at it means that the video should be manually cut a lot of time, re-encoded...

EDIT Is there any solution for instance based on WebAssembly (more compatible with old browsers) or Webcodec (more efficient, but not yet wide-spread)? Webcodec seems to allow pretty amazing things, but I'm not sure how to use them for that. I would love to hear solution based on both of them since firefox does not handle webcodec yet. Note that it would be great if audio is not lost in the process. Bonus if I can also make controls appear on request.

EDIT: I'm not sure to understand what's happening here (source)... But it seems to do something close to my need (using webassembly I think) since it manages to play a video in a canvas, with frame... Here is another website that does something close to my need using Webcodec. But I'm not sure how to reliably synchronize sound and video with webcodec.

EDIT: answer to the first question

Concerning the video frame, indeed I chose poorly my frame rate, it was 25 not 24. But even by using a framerate of 25, I still don't get a frame-precise stop, on both Firefox and Chromium. For instance, here is a recording (using OBS) of your demo (I see the same with mine when I use 25 instead of 24):

enter image description here

one frame later, see that the butter "fly backward"(this is maybe not very visible with still screenshots, but see for instance the position of the lower left wing in the flowers):

enter image description here

I can see three potential reasons: first (I think it is the most likely reason), I heard that video.currentTime was not always reporting accurately the time, maybe it could explain why here it fails? It seems to be pretty accurate in order to change the current frame (I can go forward and backward by one frame quite reliably as far as I can see), but people reported here that video.currentTime is computed using the audio time and not the video time in Chromium, leading to some inconsistencies (I observe similar inconsistencies in Firefox), and here that it may either lead the time at which the frame is sent to the compositor or at which the frame is actually printed in the compositor (if it is the latest, it could explain the delay we have sometimes). This would also explain why requestAnimationVideoFrame is better, as it also provides the current media time.

The second reason that could explain that problem is that setInterval may not be precise enough... However requestAnimationFrame is not really better (requestAnimationVideoFrame is not available in Firefox) while it should fire 60 times per seconds which should be quick enough.

The third option I can see is that maybe the .pause function is quite long to fire... and that by the end of the call the video also plays another frame. On the other hand, your example using requestAnimationVideoFrame https://mvyom.csb.app/requestFrame.html seems to work pretty reliably, and it's using .pause! Unfortunately it only works in Chromium, but not in firefox. I see that you use metadata.mediaTime instead of currentTime, maybe this is more precise than current time.

The last option is that there is maybe something subtle concerning vsync as explained in this page. It also reports that expectedDisplayTime may help to solve this issue when using requestAnimationVideoFrame.

tobiasBora
  • 1,542
  • 14
  • 23
  • That's maybe not even possible depending on how the video is encoded, https://en.wikipedia.org/wiki/Key_frame#Video_compression if the frame you try to stop on is not a key frame the browser will have to render it out on it's own giving this slight stuttering effect. – Luca Kiebel Jan 07 '22 at 10:56
  • There's probably a better solution to your problem than using JS, or even just a video for that matter. You could split the video at the frames you want it stopped at, display the frame, talk about it, and continue the next part of the video from another file after that – Luca Kiebel Jan 07 '22 at 10:58
  • @LucaKiebel Well I understand that the browser may have issues to render quickly a non key frame. But here I actually want to stop at a given keyframe, so I don't see why it would be problematic. For now I'm solving the problem slightly differently, by basically just stopping one frame before and then I `seekTo` the good frame (this creates as you say a stuttering effect that I can understand, but it's better than going to far in the video). I'm using JS so that I can easily export the animations online, move them easily between computers, and integrate mathjax/JS diagrams. – tobiasBora Jan 07 '22 at 16:43
  • 1
    You may find [this rather lengthy discussion](https://github.com/w3c/media-and-entertainment/issues/4) informative over the historical/current issues with frame-accurate seeking. Using the new `requestVideoFrameCallback` seems to be the cleanest solution (but as you noted isn't supported by all browsers). Another approach could be to precache the stopped frames in a canvas or image, then when you stop the video (maybe a frame early), you can display the canvas exactly on top of the video pretty quickly. – Steve Jan 24 '22 at 18:33
  • For testing I used third pause. With 25fps, 94th and 95th frame have completely different background, so it was easy to debug. – the Hutt Jan 24 '22 at 19:56
  • @onkarruikar Actually, the initial frames were actually made to target this sepcific time, but when changing the framerate it also changes the time it arrives ^^ Actually, is there any reason you used `metadata.presentedFrames - doneCount` and not `Math.floor(metadata.mediaTime.toFixed(5) * this.framerate)`? It seems quite hard to maintain reliably the `doneCount` number. – tobiasBora Jan 24 '22 at 20:07
  • I've explained this in comment below my answer. Surprisingly, metadata.mediaTime has precision of only 2. Also, there is a caveat regarding the time. I don't know if fraction of the seconds are frames or milliseconds. The SMPTE time uses frames as fraction of a second. e.g. At 25fps time `00:00:5:26` means 6seconds. And not 26 milliseconds after 5. That is why in my [demo](https://mvyom.csb.app/requestFrame.html) I was trying to calculate fps when time was whole number. using `if ((elapsed - (elapsed | 0)) == 0)` – the Hutt Jan 24 '22 at 20:23
  • @tobiasBora You say this is for presentations (_eg:_ is it as a public speaker?) so with that in mind... I might have a useful Answer for you, but **(1)** Are you opening the video files from local storage? I mean do you have (or can get) access to some segment of bytes (_ie:_ for the file header) **(2)** For getting **total frames**, it'll be different for each format. I can guide you how to from MP4 bytes so you'd have to cover other formats (webM, Ogg, etc) yourself **(3)** Does your accepted solve your problem? Or are you open to other solutions? – VC.One Jan 25 '22 at 21:08
  • @vc-one : sorry, given the few responses I had I thought it would be better to reward it directly before I forget. 3) So far, the answer works nicely in chrome, but not in Firefox since Firefox does not implement it yet. There I stop one frame earlier and seek to the good frame, and it kinds of work but the last frame arrives with a bit of delay. – tobiasBora Jan 26 '22 at 07:39
  • 1
    @vc-one : In any case, I'm definitely interested by alternatives, notably I'm interested to play smoothly the video backward when going to previous slide by re-encoding the video backward, but I guess I'll need to use canvas to prevent glitches when changing the source and time of the video.1) typically this is locally stored, but ideally I'd prefer not to rely too much on that since I may publish my "slides" online later. 2) ok, why not! For now I get the total number of frames using the media length and dividing by the fps, but I'm not sure if it's the most reliable solution. – tobiasBora Jan 26 '22 at 07:39

2 Answers2

5

The video has frame rate of 25fps, and not 24fps: enter image description here

After putting the correct value it works ok: demo
The VideoFrame api heavily relies on FPS provided by you. You can find FPS of your videos offline and send as metadata along with stop frames from server.


The site videoplayer.handmadeproductions.de uses window.requestAnimationFrame() to get the callback.


There is a new better alternative to requestAnimationFrame. The requestVideoFrameCallback(), allows us to do per-video-frame operations on video.
The same functionality, you domed in OP, can be achieved like this:

   const callback = (now, metadata) => {
      if (startTime == 0) {
        startTime = now;
      }
      elapsed = metadata.mediaTime;
      currentFrame = metadata.presentedFrames - doneCount;

      fps = (currentFrame / elapsed).toFixed(3);
      fps = !isFinite(fps) ? 0 : fps;

      updateStats();
      if (stopFrames.includes(currentFrame)) {
        pauseMyVideo();
      } else {
        video.requestVideoFrameCallback(callback);
      }
   };
   video.requestVideoFrameCallback(callback);

And here is how demo looks like.
The API works on chromium based browsers like Chrome, Edge, Brave etc.


There is a JS library, which finds frame rate from video binary file, named mediainfo.js.

the Hutt
  • 16,980
  • 2
  • 14
  • 44
  • Thanks a lot for your help. So as I reported, updating 24 to 25 does not solve the problem (but thanks for pointing that out). Concerning the requestVideoFrameCallback, it seems to work quite reliably in your demo (too bad it's not working in firefox)... However I'm not sure to understand: is there any reason to use presentedFrames instead of mediaTime? Notably, I'm afraid to see `doneCount` getting wrong if the user uses directly the controls of the video. – tobiasBora Jan 24 '22 at 19:44
  • In the demo I was trying to figure out FPS using presented frames and elapsed time. I was hoping to get sharp 25 fps number. That is why I didn't mix them. Unfortunately, the `mediaTime`, provided by the API, has precision of only 2 decimal places. So the fps number gets accurate only towards the end. Sadly you have to keep track of doneCount for such calculations. And we need to listen to every play pause event. I think user interactions must also fire these events, so there shouldn't be any issue regarding the counting. – the Hutt Jan 24 '22 at 20:09
  • Ok thanks. If I'm only concerned about stopping at the good frame and don't care about detecting FPS, `mediaTime` is enough right? `2` decimals are precise enough for a 30 or even 60 fps video if I'm not wrong. – tobiasBora Jan 25 '22 at 16:46
  • yeah mediaTime should be good enough. You may find difference between real time and video time(mediaTime) – the Hutt Jan 25 '22 at 18:10
  • `mediaTime` can be ahead in time? But if I pause, it is paused instantaneously? – tobiasBora Jan 25 '22 at 18:37
  • No, I meant if you play at slow speed or your hardware is slow. Then videotime will be behind from time counted using system clock(new Date()). – the Hutt Jan 26 '22 at 06:19
  • Had chromium changed something recently? While my example was working fine before, I can't make it work anymore. Similarly, your link also fails now (I have chromium 95.0.4638.54), I can't even start the video in yours. – tobiasBora Mar 18 '22 at 17:12
  • (I needed to use this https://stackoverflow.com/a/36898221/4987648 to ensure I pause always when I'm allowed to) – tobiasBora Mar 18 '22 at 18:03
  • I am on `Version 99.0.4844.51 (Official Build) (64-bit)` and your and my demos are working fine. I see you have numbered the frames in your video. – the Hutt Mar 19 '22 at 07:23
  • Ok strange... I'll try to upgrade when I've more time. Anyway, I managed to make my own code work (and more resilient). Yes, I've numbered the frames in the video, it's easier to check if javascript is accurate. – tobiasBora Mar 19 '22 at 08:41
  • This solution seems to work nicely to stop, but it seems that if I want to jump randomly to a precise position, changing the `video.currentTime` does not work always reliably seek the good frame... I'm wondering if I won't need to find a solution based on webcodec to really control this. – tobiasBora Mar 25 '22 at 12:59
1

See if this helps you. I will expand on it later if it's useful to you.

Can test online via: https://www.w3schools.com/tags/tryit.asp?filename=tryhtml5_video

  • It will count total frames in MP4 file.
  • it will estimate current frame (as video plays).

Let me know if useful towards a solution for your problem and I will expand it to do reverse playback etc and also deal with stopping at specific frames (when set into the "stopping points" box).

<!DOCTYPE html>
<html>
<body>

<h1 style="position: absolute; top: 10px; left: 10px" > Demo // Stop Video at Specific Frame(s) : </h1>
<br>

<div style="z-index: 1; overflow:hidden; position: absolute; top: 60px; left: 10px; font-family: OpenSans; font-size: 14px">
<p> <b> Choose an .MP4 video file... </b> </p>
<input type="file" id="choose_media" accept=".mov, .mp4" />
</div>

<video id="myVideo" width="640" height="480" controls muted playsinline 
style="position: absolute; top: 80px; left: 10px" >
<source src="vc_timecode3.mp4" type="video/mp4">
</video>

<div id="cont_texts" style="position: absolute; top: 80px; left: 700px" >

<span> Current Time : </span> <span id="txt_curr_time"> 00:00:00 </span> 
<br><br>
<span> Estimated Frame Num : </span> <span id="txt_est_frame"> 0 </span> 
<br><br>
<span> Total Frames (video) : </span> <span id="txt_total_frame"> -999 </span> 
<br><br>

<span onclick="check_points()" > Stopping Points Array : </span> <input type="text" id="stopPointsArray" value="" > 

</div>

</body>


<script>


////////////////////////////////

//# VARS
var myVideo = document.getElementById( 'myVideo' );
var video_duration;

var h; var m; var s;
var h_time; var m_time; var s_time;

var vid_curr_time = document.getElementById( 'txt_curr_time' );
var vid_est_frame = document.getElementById( 'txt_est_frame' );
var vid_total_frame = document.getElementById( 'txt_total_frame' );

var reader; //to get bytes from file into Array
var bytes_MP4; //updated as Array

//# MP4 related vars
var got_FPS = false; var video_FPS = -1; 
var temp_Pos = 0;  var sampleCount = -1;
var temp_int_1, temp_int_2 = -1;

                    
var array_points = document.getElementById("stopPointsArray");
array_points.addEventListener('change', check_points );

//# EVENTS
document.getElementById('choose_media').addEventListener('change', onFileSelected, false);

myVideo.addEventListener("timeupdate", video_timeUpdate);           
myVideo.addEventListener("play", handle_Media_Events );
myVideo.addEventListener("pause", handle_Media_Events );
myVideo.addEventListener("ended", handle_Media_Events );

//# LET'S GO...
        
function onFileSelected ( evt )
{
    file = evt.target.files[0];
    path = (window.URL || window.webkitURL).createObjectURL(file);
    
    reader = new FileReader();
    reader.readAsArrayBuffer(file);
    
    
    reader.onloadend = function(evt) 
    {
        //alert( " file is selected ... " );
        
        if (evt.target.readyState == FileReader.DONE) 
        {
            bytes_MP4 = new Uint8Array( evt.target.result );
            get_MP4_info( bytes_MP4 );
            
            //# use bytes Array as video tag source
            let path_MP4 = (window.URL || window.webkitURL).createObjectURL(new Blob([bytes_MP4], { type: 'video/mp4' })); //'image/png' //'application/octet-stream'
            myVideo.src = path_MP4;
            myVideo.load();
            
            video_duration = myVideo.duration;
            txt_total_frame.innerHTML =( sampleCount);
            //alert("video FPS : " + video_FPS);
        }
        
    }
    
}

function check_points (e)
{
    alert( "Array Points are : " + e.target.value );
}

function handle_Media_Events()
{
    if ( event.type == "ended" )
    { 
        myVideo.currentTime = 0; myVideo.play(); myVideo.pause(); myVideo.play();
    }
    
    //{ myVideo.currentTime = 8; btn_ctrl.src = "ico_vc_play.png"; vid_isPlaying = false; bool_isPlaying = true; }
    
    if ( event.type == "play" )
    {
        if ( myVideo.nodeName == "VIDEO" )
        {

        }
    
    }
    
    else if ( event.type == "pause" )
    {
        
        
    }
    
    else if ( event.type == "seeked" )
    {
        
        
    }
    
}

function video_timeUpdate()
{
    vid_curr_time.innerHTML = ( convertTime ( myVideo.currentTime ) );
    
    vid_est_frame.innerHTML = Math.round ( video_FPS * myVideo.currentTime );
    
}

function convertTime ( input_Secs ) 
{
    h = Math.floor(input_Secs / 3600);
    m = Math.floor(input_Secs % 3600 / 60);
    s = Math.floor(input_Secs % 3600 % 60);

    h_time = h < 10 ? ("0" + h) : h ;
    m_time = m < 10 ? ("0" + m) : m ;
    s_time = s < 10 ? ("0" + s) : s ;
    
    if ( (h_time == 0) && ( video_duration < 3600) ) 
    { return ( m_time + ":" + s_time ); }
    else 
    { return ( h_time + ":" + m_time + ":" + s_time ); }
     
}

function get_MP4_info( input ) //# "input" is Array of Byte values
{
    //# console.log( "checking MP4 frame-rate..." );
    
    got_FPS = false;
    temp_Pos = 0; //# our position inside bytes of MP4 array
     
    let hdlr_type = "-1"; 
    
    while(true)
    {
        //# Step 1) Prepare for when metadata pieces are found  
        //# When VIDEO HANDLER Atom is found in MP4
        
        //# if STSZ ... 73 74 73 7A  
        if (input[ temp_Pos+0 ] == 0x73)
        {
            if ( ( input[temp_Pos+1] == 0x74 ) && ( input[temp_Pos+2] == 0x73 ) && ( input[temp_Pos+3] == 0x7A ) )
            {
                if ( hdlr_type == "vide" ) //# only IF it's the "video" track
                {
                    temp_Pos += 12;
                    sampleCount = ( ( input[temp_Pos+0] << 24) | (input[temp_Pos+1] << 16) | (input[temp_Pos+2] << 8) | input[temp_Pos+3] );
                    console.log( "found VIDEO sampleCount at: " + sampleCount );
                    
                    video_FPS = ( ( sampleCount * temp_int_1 ) / temp_int_2 );
                    console.log( "FPS of MP4 ### : " +  video_FPS );
                }
                
            }
            
        }
        
        //# Step 2) Find the pieces of metadata info
        //# Find other Atoms with data needed by above VIDEO HANDLER code.
        
        
        //# for MOOV and MDAT
        if (input[ temp_Pos ] == 0x6D) //0x6D
        {
            //# if MDAT ... 6D 64 61 74
            if ( ( temp_Pos[temp_Pos+1] == 0x64 ) && ( input[temp_Pos+2] == 0x61 ) && ( input[temp_Pos+3] == 0x74 ) )
            {
                temp_int = ( ( input[temp_Pos-4] << 24) | (input[temp_Pos-3] << 16) | (input[temp_Pos-2] << 8) | input[temp_Pos-1] );
                temp_Pos = (temp_int-1);
                if ( temp_Pos >= (input.length-1) ) { break; }
            }
            
            //# if MOOV ... 6D 6F 6F 76
            if ( ( input[temp_Pos+1] == 0x6F ) && ( input[temp_Pos+2] == 0x6F ) && ( input[temp_Pos+3] == 0x76 ) )
            {
                temp_int = ( ( input[temp_Pos-4] << 24) | (input[temp_Pos-3] << 16) | (input[temp_Pos-2] << 8) | input[temp_Pos-1] );
            }
            
            //# if MDHD ... 6D 64 68 64
            if ( ( input[temp_Pos+1] == 0x64 ) && ( input[temp_Pos+2] == 0x68 ) && ( input[temp_Pos+3] == 0x64 ) )
            {
                temp_Pos += 32;
                
                //# if HDLR ... 68 64 6C 72
                if (  input[temp_Pos+0] == 0x68 )
                {
                    if ( ( input[temp_Pos+1] == 0x64 ) && ( input[temp_Pos+2] == 0x6C ) && ( input[temp_Pos+3] == 0x72 ) )
                    {
                        temp_Pos += 12;
                        hdlr_type = String.fromCharCode(input[temp_Pos+0], input[temp_Pos+1], input[temp_Pos+2], input[temp_Pos+3] );
                    }
                }
            }
            
            //# if MVHD ... 6D 76 68 64
            if ( ( input[temp_Pos+1] == 0x76 ) && ( input[temp_Pos+2] == 0x68 ) && ( input[temp_Pos+3] == 0x64 ) )
            {
                temp_Pos += (12 + 4);
                
                //# get timescale
                temp_int_1 = ( ( input[temp_Pos+0] << 24) | (input[temp_Pos+1] << 16) | (input[temp_Pos+2] << 8) | input[temp_Pos+3] );
                ///console.log( "MVHD timescale at: " + temp_int_1 );
                
                //# get duration
                temp_int_2 = ( ( input[temp_Pos+4+0] << 24) | (input[temp_Pos+4+1] << 16) | (input[temp_Pos+4+2] << 8) | input[temp_Pos+4+3] );
                ///console.log( "MVHD duration at: " + temp_int_2 );
            }
            
        }
        
        if( temp_Pos >= (input.length-1) ) { break; }
        if( got_FPS == true) { break; }
        
        temp_Pos++;
    }
    
}

</script>

</html>
VC.One
  • 14,790
  • 4
  • 25
  • 57
  • Thanks a lot for your answer, that sounds like a lot of work! The part on the binary decoding is particularly cryptic to someone like me that does not know anything about the MP4 format, but it's good to know that it's possible to parse a file like that! Is there anything that prevents it to work when the file is loaded not using "browse"? (btw, the online version does not work) Also, I'm curious, why do you get the FPS by parsing the number of frame? There is no way to access the FPS directly? – tobiasBora Jan 28 '22 at 19:51
  • Concerning the reverse playback, I've to admit that I'm quite curious to see how it would be possible. I guess it's possible using webcodec be decoding and re-encoding the video, is it how you would proceed? Also, to stop the video at a precise frame, do you have any better method than those proposed by onkar ruikar? Notably, I see that you use `media.currentTime` which does not provide a super reliable data to obtain the precise frame number. – tobiasBora Jan 28 '22 at 19:51