As part of my application the user should be able to select a range of images and video, and merge them into a single video. The images are transformed into 5-second long fragments of video. All these can be in different resolutions, and possibly formats as well.
I have some of the parts of the process already:
Turn image into video with
loop 1 -i image.png -r 1 -t 5 imageAsVideo.mp4
Concatenate videos (of possibly different formats):
-i movie1.mp4 -i movie2.mp4 -filter_complex "[0:v:0][1:v:0]concat=n=2:v=1:a=1[outv][outa]" -map "[outv]" -map "[outa]" concatenatedMovies.mp4
However, now I do these steps one after the other, and I don't know how/where to add the scale or scale2ref filters. Keeping the audio for the video streams is yet another concern. I assume I'll have to add a dummy audio stream to the "image" videostreams, but that makes things even more complex.
Is there a way to do all this in one command? What order should the different filters be in?