0

I've been using Google Colab for a few weeks now and I've been wondering what the difference is between the two following commands (for example):

  1. !ffmpeg ...
  2. subprocess.Popen(['ffmpeg', ...

I was wondering because I ran into some issues when I started either of the commands above and then tried to stop execution midway. Both of them cancel on KeyboardInterrupt but I noticed that after that the runtime needs a factory reset because it somehow got stuck. Checking ps aux in the Linux console listed a process [ffmpeg] <defunct> which somehow still was running or at least blocking some ressources as it seemed.

I then did some research and came across some similar posts asking questions on how to terminate a subprocess correctly (1, 2, 3). Based on those posts I generally came to the conclusion that using the subprocess.Popen(..) variant obviously provides more flexibility when it comes to handling the subprocess: Defining different stdout procedures or reacting to different returncode etc. But I'm still unsure on what the first command above using the ! as prefix exactly does under the hood.

Using the first command is much easier and requires way less code to start this process. And assuming I don't need a lot of logic handling the process flow it would be a nice way to execute something like ffmpeg - if I were able to terminate it as expected. Even following the answers from the other posts using the 2nd command never got me to a point where I could terminate the process fully once started (even when using shell=False, process.kill() or process.wait() etc.). This got me frustrated, because restarting and re-initializing the Colab instance itself can take several minutes every time.

So, finally, I'd like to understand in more general terms what the difference is and was hoping that someone could enlighten me. Thanks!

shadowtalker
  • 12,529
  • 3
  • 53
  • 96
товіаѕ
  • 2,881
  • 4
  • 23
  • 53
  • Note: I've ran into issues with `ffmpeg` before trying to build larger videos from `png` files from and to Google Drive. I'm not sure if this is actually an issue with the process handling, with `ffmpeg` or with Google Drive. The main issue with that is that it is just not suitable to copy large amounts of files from Drive locally beforehand everytime. Considering that the runtime may be terminated and I don't want to lose my progress on the generated `png` files leaves me with the situation that I have to store the files on Google Drive anyway and then process them later on from there – товіаѕ Apr 24 '22 at 07:49
  • 1
    `!` is Ipython magic. I suspect it just uses `subprocess` underneath the hood. in any case, it isn't really a "python" thing, IPython REPLs provide a bunch of functionality for interactive sessions that make it easy to work with, e.g. the `%%timeit` magic, but that isn't Python – juanpa.arrivillaga Apr 24 '22 at 10:39

1 Answers1

3

! commands are executed by the notebook (or more specifically by the ipython interpreter), and are not valid Python commands. If the code you are writing needs to work outside of the notebook environment, you cannot use ! commands.

As you correctly note, you are unable to interact with the subprocess you launch via !; so it's also less flexible than an explicit subprocess call, though similar in this regard to subprocess.call

Like the documentation mentions, you should generally avoid the bare subprocess.Popen unless you specifically need the detailed flexibility it offers, at the price of having to duplicate the higher-level functionality which subprocess.run et al. already implement. The code to run a command and wait for it to finish is simply

subprocess.check_call(['ffmpeg', ... ])

with variations for capturing its output (check_output) and the more modern run which can easily replace all three of the legacy high-level calls, albeit with some added verbosity.

tripleee
  • 175,061
  • 34
  • 275
  • 318