0

My goal: To concat an image (foo.jpg) with a video (bar.mp4, 3 seconds long). Show foo.jpg for 2 seconds only. Output video should be just around 5 seconds long.

I used:

ffmpeg -loop 1 -t 2 -framerate 1 -i foo.jpg -f lavfi -t 2 -i anullsrc -i bar.mp4 -filter_complex "[0][1][2:v][2:a] concat=n=2:v=1:a=1 [vpre][a];[vpre]fps=24,scale=32:24[v]" -map "[v]" -map "[a]" out.mp4

I think the command means: Loop foo.jpg for 2 seconds at framerate of 1 frame per second. At the same time, add silent audio track to the 2-second foo.jpg video. Then concat with bar.mp4. Make final output framerate 24 fps. Scale it to 32x24 dimension (intentionally tiny for testing).

Expected: output to be about 5 seconds long in total.

Reality: output is 3 minutes and 38 seconds long. The first 5 seconds is perfect. After that, video just stays silent with the 5th-second frame frozen until the end.

My research shows that it might be related to -video_track_timescale

This command also fails with same over-long result (I added -video_track_timescale 600):

ffmpeg -loop 1 -t 2 -framerate 1 -i foo.jpg  -f lavfi -t 2 -i anullsrc -i bar.mp4 -filter_complex "[0][1][2:v][2:a] concat=n=2:v=1:a=1 [vpre][a];[vpre]fps=24,scale=32:24[v]" -map "[v]" -map "[a]" -video_track_timescale 600 out.mp4

Additional info about file bar.mp4:

$ ffmpeg -i bar.mp4 
ffmpeg version 4.3.1 Copyright (c) 2000-2020 the FFmpeg developers
  built with Apple clang version 11.0.3 (clang-1103.0.32.62)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3.1_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'bar.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.29.100
  Duration: 00:00:01.94, start: 0.000000, bitrate: 641 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, smpte170m/unknown/smpte170m), 480x360 [SAR 1:1 DAR 4:3], 354 kb/s, 24.58 fps, 24.58 tbr, 113734695.00 tbn, 49.16 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 280 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
At least one output file must be specified
Morris
  • 948
  • 2
  • 9
  • 22

2 Answers2

0

Same command, only added -r 24 flag before my input video. This specifies my framerate so ffmpeg does not need to guess. Now when 2-second video concats with 3-second video, I get 5-second output.

ffmpeg -loop 1 -t 2 -framerate 1 -i foo.jpg -f lavfi -t 2 -i anullsrc -r 24 -i bar.mp4 -filter_complex "[0][1][2:v][2:a] concat=n=2:v=1:a=1 [vpre][a];[vpre]fps=24,scale=32:24[v]" -map "[v]" -map "[a]" out.mp4
Morris
  • 948
  • 2
  • 9
  • 22
0

concat demuxer

Using the concat demuxer preserves the quality of the main video, but is more complicated.

  1. Make video from image, matching the attributes of the main video to concatenate to:

    ffmpeg -loop 1 -framerate 24.58 -t 2 -i image.jpg -f lavfi -i anullsrc=cl=stereo:r=48000 -vf "scale=480:360:force_original_aspect_ratio=decrease,pad=480:360:-1:-1:color=black,format=yuv420p,setsar=1" -shortest image.mp4
    
  2. Make input.txt containing:

    file 'image.mp4'
    file 'main.mp4'
    
  3. Concatenate:

    ffmpeg -f concat -i input.txt -c copy -movflags +faststart output.mp4
    

concat filter

The concat filter lets you do everything in one command but re-encodes the main video:

ffmpeg -loop 1 -framerate 24 -t 2 -i image.jpg -f lavfi -t 2 -i anullsrc=cl=stereo:r=48000 -i main.mp4 -filter_complex "[0:v]scale=480:360:force_original_aspect_ratio=decrease,pad=480:360:-1:-1:color=black,format=yuv420p,setsar=1[img];[2:v]fps=24[main];[img][1:a][main][2:a]concat=n=2:v=1:a=1[v][a]" -map "[v]" -map "[a]" -movflags +faststart output.mp4
llogan
  • 121,796
  • 28
  • 232
  • 243