17

I have a directory full of images following the pattern <timestamp>.png, where <timestamp> represents milliseconds elapsed since the first image. input.txt contains a list of the interesting images:

file '0.png'
file '97.png'
file '178.png'
file '242.png'
file '296.png'
file '363.png'
...

I am using ffmpeg to concatenate these images into a video:

ffmpeg -r 15 -f concat -i input.txt output.webm

How do I tell ffmpeg to place each frame at its actual position in time instead of using a constant framerate?

piedar
  • 2,599
  • 1
  • 25
  • 37

3 Answers3

19

Following LordNeckbeard's suggestion to supply the duration directive to ffmpeg's concat demuxer using the time duration syntax, input.txt looks like this:

file '0.png'
duration 0.097
file '97.png'
duration 0.081
file '178.png'
duration 0.064
file '242.png'
duration 0.054
file '296.png'
duration 0.067
file '363.png'

Now ffmpeg handles the variable framerate.

ffmpeg -f concat -i input.txt output.webm

Here is the C# snippet that constructs input.txt:

Frame previousFrame = null;

foreach (Frame frame in frames)
{
    if (previousFrame != null)
    {
        TimeSpan diff = frame.ElapsedPosition - previousFrame.ElapsedPosition;
        writer.WriteLine("duration {0}", diff.TotalSeconds);
    }

    writer.WriteLine("file '{0}'", frame.FullName);
    previousFrame = frame;
}
piedar
  • 2,599
  • 1
  • 25
  • 37
  • I'm using ffmpeg 4 and this method gives a very slowed down video. During the process I can clearly see the frame rate starting at the right level (around 10) and then dropping constantly and slowly to 3! – Francesco Gabbrielli Sep 27 '18 at 20:53
  • 2
    Despite the high votes, this doesn't actually work as expected. When generating an image sequence using concat, ffmpeg uses a [hard-coded 25fps](https://superuser.com/a/1337998/416032) setting that gives the slideshow a maximum granularity of 1/25=0.04s. Thus, the durations in the concat file exampled in the answer are mostly ignored, and images are displayed 0.08s (the closest rounding) apart, resulting in a (nearly) constant framerate, and failing to meet the requirements of the question. – Arnon Weinberg Feb 22 '22 at 22:35
5

Variable frame-rate (VFR) video can be generated using a concat file with durations.

However, one catch is that concat files use a hard-coded frame-rate of 25fps, and therefore all frame durations must be multiples of (1/25=)0.04s. This is a problem if the VFR video segment requires higher granularity.

To work around this, the concat file durations can be multiplied by some factor that maintains granularity above 25fps, and then that same factor can be divided out when generating the video.

For the example in the question, we can use 1000/25=40 as the factor (x1000 guarantees an integer, and /25 gives 0.04s granularity), but as long as durations are small, x1000 is a much simpler factor to use. The resulting concat file looks like:

ffconcat version 1.0
file 0.png
duration 97
file 97.png
duration 81
file 178.png
duration 64
file 242.png
duration 54
file 296.png
duration 67
file 363.png
...

The VFR video can then be generated using:

ffmpeg -f concat -i concat.txt -vf "settb=1/1000,setpts=PTS/1000" -vsync vfr -r 1000 output.webm

Notice how the factor is divided back out in the setpts clause. This command also sets the factor as the video's timescale. Now let's check the result:

ffmpeg -i output.webm -vf showinfo -f null /dev/null 2>&1 | sed 's/\r/\n/g' | egrep '^\[Parsed_showinfo_'
[Parsed_showinfo_0 @ 0x10343c0] config in time_base: 1/1000, frame_rate: 1000/1
[Parsed_showinfo_0 @ 0x10343c0] config out time_base: 0/0, frame_rate: 0/0
[Parsed_showinfo_0 @ 0x10343c0] n:   0 pts:      0 pts_time:0       pos:      653 fmt:yuv420p sar:12/11 s:176x144 i:P iskey:1 type:I checksum:5A42C170 plane_checksum:[86E40008 29FD60B4 29FD60B4] mean:[116 128 128] stdev:[44.7 0.0 0.0]
[Parsed_showinfo_0 @ 0x10343c0] n:   1 pts:     97 pts_time:0.097   pos:     1011 fmt:yuv420p sar:12/11 s:176x144 i:P iskey:0 type:P checksum:C85CAAEF plane_checksum:[27D3E978 29FD60B4 29FD60B4] mean:[116 128 128] stdev:[45.9 0.0 0.0]
[Parsed_showinfo_0 @ 0x10343c0] n:   2 pts:    178 pts_time:0.178   pos:     1304 fmt:yuv420p sar:12/11 s:176x144 i:P iskey:0 type:P checksum:B68690E0 plane_checksum:[4813CF69 29FD60B4 29FD60B4] mean:[116 128 128] stdev:[46.0 0.0 0.0]
[Parsed_showinfo_0 @ 0x10343c0] n:   3 pts:    242 pts_time:0.242   pos:     1523 fmt:yuv420p sar:12/11 s:176x144 i:P iskey:0 type:P checksum:EC979DAF plane_checksum:[527EDC38 29FD60B4 29FD60B4] mean:[116 128 128] stdev:[46.6 0.0 0.0]
[Parsed_showinfo_0 @ 0x10343c0] n:   4 pts:    296 pts_time:0.296   pos:     1744 fmt:yuv420p sar:12/11 s:176x144 i:P iskey:0 type:P checksum:6E93A05B plane_checksum:[0AE8DCA0 C00C62F8 29FD60B4] mean:[116 128 128] stdev:[47.1 0.7 0.0]
[Parsed_showinfo_0 @ 0x10343c0] n:   5 pts:    363 pts_time:0.363   pos:     2018 fmt:yuv420p sar:12/11 s:176x144 i:P iskey:0 type:P checksum:49A89113 plane_checksum:[C164C95D 06FE66F3 29FD60B4] mean:[116 128 128] stdev:[47.5 1.1 0.0]
Arnon Weinberg
  • 871
  • 8
  • 20
2

Appears your images are non standard frame rate...One option would be to duplicate the appropriate image "once per millisecond" [i.e. for

 file '0.png'
 file '97.png'

duplicate file 0.png 96 times, so it becomes 0.png 1.png 2.png etc. (or use symlinks, if on linux).

Then you can combine them using the normal image inputter [with input rate of 1ms/frame]. https://trac.ffmpeg.org/wiki/Create%20a%20video%20slideshow%20from%20images

rogerdpack
  • 62,887
  • 36
  • 269
  • 388