1

I'am trying to write some code that is reading from stdin and then trims a video with seeking. Thats what I got so far:

def trim():
            in_use=io.BytesIO()
            process = sp.Popen(shlex.split('ffmpeg -i pipe: -ss 00:00:01.0 -t 00:00:01.4 -c:v libx264 -strict -2 output.mp4'), stdin=sp.PIPE, bufsize=10**8)
            # Pipewriter function
            pipewriter(in_use,process)
            process.wait()

The pipewriter function does look like this:

def pipewriter():
        video.seek(0)
        for chunk in iter(partial(video.read,1024),b''):
            process.stdin.write(chunk)
        process.stdin.flush()
        process.stdin.close()

The file inside the in_use io.BytesIO object is a valid video and thats not the problem. The output file also does get generatet and trimmed correctly so the function does work. My problem is that because of seeking and trimming the pipewriter function does write the whole video into the pipe. But the ffmpeg process stops after -t 00:00:01.4 seconds so the rest of the video written in stdin leads to an pipe Error

Does somebody got a clean solution for that without try except. I also do have to trim the video as accurate as possible. The current solution does work good for me.

Error:

   process.stdin.flush()
BrokenPipeError: [Errno 32] Broken pipe
scuba14
  • 33
  • 9
  • 1
    Pipes are inherently unseekable. It's only possible to seek from a real file, not a pipe. You *cannot* restart reading a pipeline from the beginning. Under no circumstances does `file.seek(0)` actually "work" in the sense of causing content already read from `file` to be available to read a second time when `file` is a pipe. – Charles Duffy Jan 28 '22 at 18:30
  • ...so, if it "works" when you catch and ignore the exception, that means your logic doesn't need the `seek()` at all (because, when your file is a FIFO, `seek()` has no effect whatsoever). – Charles Duffy Jan 28 '22 at 18:32
  • seeking (forward) before starting a stream at ffmpeg's level doesn't need to mean using the `seek()` syscall; it can just mean reading and throwing away content up to a given timestamp, only using content at/after that point to generate output. (By contrast, seeking _backwards_ isn't possible when input is from a file rather than a FIFO, unless one knew there would be a need to seek backwards up-front and copied content aside during the first pass). – Charles Duffy Jan 28 '22 at 18:33
  • 1
    BTW, this is part of why `cat file | tail -n 1` is so much slower (for a large file) than `tail -n 1 – Charles Duffy Jan 28 '22 at 18:36
  • ...anyhow, think of ffmpeg's idea of "seeking forward" past things the user doesn't want to see when input is from a pipe the same way `tail` seeks forwards in that same circumstance, by `read()`ing the content and then ignoring it until it gets to something that _is_ wanted as output; whereas if it's pointing to a real file in a format that's amenable to random access it can take advantage of that format's amenities to find the binary offset corresponding with the desired timestamp. – Charles Duffy Jan 28 '22 at 18:41
  • BTW, this is going well off-topic, but I strongly advise against using `shlex.split()` the way you are here. Much better to just hardcode a list -- `['ffmpeg', '-i', 'pipe:', ...]` -- so you don't need to escape filenames in a `split()`-compatible way, but can instead just substitute them into the argument list directly. – Charles Duffy Jan 28 '22 at 18:44
  • @CharlesDuffy Ok yeah it is throwing away content up to a given timestamp. But this part is not the problem . Its the -t timestamp flag. After that flag the pipewriter still writes to stdin although ffmpeg process already terminated or kind of closed the stdin pipe so content written into the piep doesnt get processed --> Pipeerror – scuba14 Jan 28 '22 at 18:49
  • @CharlesDuffy I can also just use python f strings in shlex. I mean I do not see no difference. But yes thats kind off topic right now – scuba14 Jan 28 '22 at 18:52
  • If you use f-strings + shlex.parse() you get security bugs: Someone who passes a filename with spaces can inject extra, arbitrary arguments into the ffmpeg command line, unless you use `shlex.quote()` _inside your f-string_ to prevent it; and at that point you're adding extra complexity to solve a problem that it was unnecessary to create in the first place. – Charles Duffy Jan 28 '22 at 19:47
  • It's not _as bad_ as the security bugs you get with `shell=True`, but it's security bugs nonetheless. – Charles Duffy Jan 28 '22 at 20:03
  • I case there is only video stream, you may solve it using 2 FFmpeg processes (as [here](https://stackoverflow.com/a/70779111/4926757)) - one for decoding and one for encoding. Assuming constant framerate, and all you want is just seeking and trimming, you can manually count the decoded frames for seeking and then for trimming (or just for trimming). Using `try` and `except` looks like a better solution... – Rotem Jan 28 '22 at 21:39

3 Answers3

3

Does somebody got a clean solution for that without try except.

No one has a fundamentally better solution because this is how the backwards propagation of a pipe closure is designed to work in Unix.

  • Forward propagation happens by a program reading from the closed input pipe, seeing EOF, wrapping up, and closing its output pipe (if any).

  • Backwards propagation happens by a program writing to the closed pipe and (by default) receiving a SIGPIPE that kills it, causing any open input pipes to be closed. Programs can choose to ignore SIGPIPE and instead handle the EPIPE exit code itself which Python uses to raise an Error in its place.

All APIs layered on top, like subprocess.communicate, simply work with this fact under the hood. The best practice is to stop fighting Python and Unix, and just go with the flow using a try-catch (optionally tidied away in a helper function).

However, if you really want a cosmetically cleaner version without try-catch, there are several bad practices you can invoke, such as disabling Python's default signal handler:

import signal
signal.signal(signal.SIGPIPE, signal.SIG_DFL);

This will cause the Python process to immediately and silently be killed instead, which is how most programs in pipelines are stopped, such as find in find / | head -n 1

that other guy
  • 116,971
  • 11
  • 170
  • 194
1

Can't you just do this?

def pipewriter(video, process):
    video.seek(0)
    for chunk in iter(partial(video.read,1024),b''):
        if process.poll() is not None:
            break
        process.stdin.write(chunk)
    if process.poll() is None:
        process.stdin.flush()
    process.stdin.close()

Based on the OP's addendum in the comment below:

Now I want to create n videos with variable length x and variable start point k and variable endpoint p

Maybe this one does the job:

def trim(ss, t ,outfile):
    sp.run(f'ffmpeg -i pipe: -ss {ss} -t {t} -c:v libx264 -strict -2 {outfile}'), 
        stdin=sp.PIPE, input=in_use.getbuffer())

for mp4file, ss, t in [('out1.mp4',ss0,t0),('out2.mp4',ss1,t1),...]:
    trim(ss,t,mp4file)
kesh
  • 4,515
  • 2
  • 12
  • 20
  • One clarification needed. You are actively writing to the `in_use io.BytesIO` object while doing this trimming job, right? – kesh Jan 28 '22 at 18:55
  • No it is a buffer which holds multiple ts-files. In this case it is just one element but later there is a list of ts files which are getting concatenated while writting them into the pipe one after another. – scuba14 Jan 28 '22 at 18:58
  • and btw your answer does not work. I still got the same error. – scuba14 Jan 28 '22 at 19:01
  • oops, I missed that you're getting the error on `flush` call. Edited – kesh Jan 28 '22 at 19:03
  • I'm not familiar with ts file format, but I suppose you mean that you read the content of the files and just butting them together in the `in_use` buffer? – kesh Jan 28 '22 at 19:04
  • Yes its just a normal video inside an io.BytesIO object. The code works perfectly fine if I do not use the -ss and -t flag to trim the video. So I think the content outside the boundries of these timestamps is responsable for the error – scuba14 Jan 28 '22 at 19:10
  • Why `BytesIO`? Can't you just read one file to `bytes` and write the whole thing to FFmpeg process whenever you get a new ts file? If you keep checking if the FFmpeg process is still alive, this approach should be faster than sending 1kB chunks in a loop. `stdin` is buffered IIRC – kesh Jan 28 '22 at 19:11
  • "The code works perfectly fine if I do not use the -ss and -t flag to trim the video." Yep, makes perfect sense because FFmpeg won't exit on its own w/out -t option – kesh Jan 28 '22 at 19:12
  • Whats the problem with BytesIO. I just use it as a buffer. I can also store byte objects and then write them completly to the pipe. But I think that would be the same as if I increase the 1024 to 10240. Thats not really the question the problem is that writing to ffmpeg curses an error because of trimming – scuba14 Jan 28 '22 at 19:20
  • Not using `BytesIO` was merely my answer to your "Does somebody got a clean solution[...]?" question. If you aren't fond of it, that's fine, no offense taken. I think the `bytes` approach is faster (could be wrong). Meanwhile. any luck with my edited solution? (the second polling?) – kesh Jan 28 '22 at 19:24
  • 1
    There are some fundamental problems with this approach: 1. it's inherently a race, 2. a process being alive does not mean its pipe is open (it might have closed its input and is finalizing the output), 3. a process being dead does not mean its pipe is closed (unlikely with ffmpeg specifically, but other programs may fork off worker processes of their own). – that other guy Jan 28 '22 at 19:28
  • @thatotherguy Ok so my problem is the follwing. I am getting asynchronously videos in .ts format. The videos are not in the correct order. So I am collecting the videos inside a list. Now I want to order them concat them and retrim them. An Example: There are 15 videos with different length hold inside of the list. Now I want to create n videos with variable length x and variable start point k and variable endpoint p. My approach was to sort them and then write them into a pipe and create the trimmed videos. I see no problem with writing them one after another to a pipe of one ffmpeg process. – scuba14 Jan 28 '22 at 19:36
  • @thatotherguy - Doesn't you argument prohibit using Python subprocess as a whole to pipe any data from 3rd party process? I've been studying the cpython `subprocess.py` source for awhile and fundamentally (as in how pipes are constructed and employed) it's `communicate()` it's not much different than what @scuba14 is trying to accomplish. (genuinely curious, to be educated) – kesh Jan 28 '22 at 19:42
  • @kesh It only prohibits using `process.poll()` to decide whether or not to write. You can still easily and robustly keep writing until either you're finished, or until the process has had enough. – that other guy Jan 28 '22 at 19:44
  • @scuba14 That's a totally valid way of doing it. Not necessarily the most efficient, but definitely works. – that other guy Jan 28 '22 at 19:46
  • 1
    @thatotherguy - Just read your answer, and it makes sense. So, the try-except really is the best (dear I say pythonic?) option. Thanks for the insight. – kesh Jan 28 '22 at 20:00
  • @scuba14 - your comment above seems to suggest that you can indeed do your task a little simpler. See if my edited answer goes along with your goal. – kesh Jan 28 '22 at 21:00
0

If video is guaranteed to be a pipe or FIFO, then video.seek(0) will never do anything useful and can simply be removed. If you're doing something that requires a multi-pass algorithm, you'll want to copy the content out to a regular file.

FIFOs are inherently unseekable; once content has been read from one once it isn't available to read again, so one cannot seek back to the beginning in any meaningful way.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • in_use is holding the video and its a buffer (io.BytesIO object). I now know that seeking in a pipe doesnt really make sense but it kinda does the job – scuba14 Jan 28 '22 at 19:03
  • nit: a pipe is a "real file" (everything is a file!), but it's not a "regular file". – William Pursell Jan 28 '22 at 19:15