0

I have 20k images that I want to compile into a video using Python. It takes unacceptably long to do this with a single process: having a for loop to load images one by one into RAM and finally writing them into a video.

Therefore, I'm looking for a parallel approach that gets the job done faster.

This sounded impossible the first time I thought about this, because this would mean multiple processes simultaneously writing to a single file (i.e., the output video file).

But on a second thought, maybe there exist some tools in Python that allow me to create a "placeholder video," and each frame of which can be safely written by one thread. That way I could create a video using all my cores much more quickly.

Is this possible at all?

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
Sibbs Gambling
  • 19,274
  • 42
  • 103
  • 174
  • Render the video on multiple threads but keep the writing of data to one thread which is fed from a Queue. Concurrent writes are hart to synconize, don't even work on some OS and will slow the whole process down. – Klaus D. Jun 20 '20 at 17:53
  • 1
    probably easier to use smth like ffmpeg -> https://stackoverflow.com/questions/5585872/python-image-frames-to-video - maybe even directly from the command line. – Patrick Artner Jun 20 '20 at 17:53
  • 1
    Consider `ffmpeg` or its Python wrapper such as MoviePy, to convert lots of images into a video. – jfs Jun 20 '20 at 17:55
  • Anyway, to the question: Python allows (because the OS allows) the same file to be opened multiple times and used concurrently. The concerns are that the code IO wrappers are themselves not thread-safe (moving index, or flushed buffer out of area alignment). The issue with the “ask” is that it does not (appear to) use a file format that could actually takes advantage of this in any way. On the other hand, a database with a strict format (see DB row pages) benefits massively. Also, it’s almost always faster to do large sequential operations vs. random writes, even on SSD. – user2864740 Jun 20 '20 at 18:15

3 Answers3

3

The issue with the idea of the "placeholder video" is that this video would inherently need to be uncompressed, since the compression of the video relies on using previous frames as a baseline to eliminate redundant information. Thus your resulting video would be quite large, though in theory it could be done (you might have to define a special file type though so probably not doable in python).

Do you have an idea of which portion of this process is taking the longest?

For instance, if opening the files is the limiting factor, you could have worker processes open images simultaneously and pass the variables to a shared process that writes a single video.

If the encoding is the limiting factor and is happening on CPU, you may be able to divide the frames sequentially and write pieces of the video, combining them at the end. However, it's also possible that you're using special hardware (e.g. GPU) for encoding in which case you probably wouldn't be able to speed the process much this way.

All this being said, I will echo the sentiments that ffmpeg is a good tool for this sort of work, and you can easily call it using the python subprocess module if you want to run from python.

DerekG
  • 3,555
  • 1
  • 11
  • 21
  • I-frames do not require preceding frames. These could be written independently for a lower compression (let’s assume the original images have lots of information still..). Taking another step then: If 120k photos were split in 1k x 120 encoded frame sets.. however, such frame sets still don’t address the original task, as that doesn’t allow for writing to the final location in the file on the first pass. – user2864740 Jun 20 '20 at 18:10
1

This is impossible.

A video file is composed with a video codec which typically uses a difference compression. I.e. only the data which differs from up to several preceeding, and possibly following, frames is recorded. The fact that resulting frame data are heavily interwined like this means that it's impossible for several processes to compete writing frames at the same time.

If Python is too slow for your task, you may want to use other tools like FFMpeg.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
  • Not entirely correct. See I-frames, which can be encoded Independently (or used at the start of “placeholder” sections before other frames). IIRC, the larger issue is standard movie formats still require dense sequential file data - and writing the entire file out sequentially at the end. – user2864740 Jun 20 '20 at 18:05
  • That is, multiple processes _can_ indeed encode different frames (or sets of frames) at the same time: the larger gotcha is writing this data to the final file immediately. These I-frames exist in normal encoding as well; observable when broken digital garble “fixes itself” in 20-30 seconds. – user2864740 Jun 20 '20 at 18:25
0

You don't necessarily need to use python for this, as you can use the FFmpeg command line tool to do the same thing, detailed in this article https://deparkes.co.uk/2018/01/05/create-video-images-ffmpeg/