Cut .mp4 in pieces Python

Question

I'm working on some .mp4 files with Python. I'm using wave, math, contextlib, speech_recognition and AudioFileClip libraries. I have very long files (video+audio). I would like to make Python cut the files in 5-minutes new files (still in .mp4) and then make Python transcribe each of them. Until now, I was able to write the following code to transcribe the initial (long) file:

import wave, math, contextlib
import speech_recognition as sr
from moviepy.editor import AudioFileClip
import os

os.chdir(" ... my path ...")  # e.g. C:/Users/User/Desktop

FILE = "file_name"  # e.g. video1  (without extension)

transcribed_audio_file_name = FILE + "_transcribed_speech.wav"
mp4_video_file_name = FILE + ".mp4"
audioclip = AudioFileClip(mp4_video_file_name)
audioclip.write_audiofile(transcribed_audio_file_name)
with contextlib.closing(wave.open(transcribed_audio_file_name,'r')) as f:
    frames = f.getnframes()
    rate = f.getframerate()
    duration = frames / float(rate)
total_duration = math.ceil(duration / 60)
r = sr.Recognizer()
for i in range(0, total_duration):
    with sr.AudioFile(transcribed_audio_file_name) as source:
        audio = r.record(source, offset=i*60, duration=60)
    f = open(FILE+"_transcription.py", "a")
    f.write(r.recognize_google(audio, language="en-US"))
    f.write(" ")
    print(r.recognize_google(audio, language="en-US"))
f.close()

print("Transcription DONE.")

How can I add a part in which I take the file "video", cut into pieces of 5 minutes each, save them as .mp4 in my folder, process (and transcribe) each piece one-by-one? Thank you in advance!

score 5 · Accepted Answer · answered Apr 30 '21 at 13:26

I would recommend using a library called movie.py

Step 1:

Install movie.py with

pip3 install moviepy

Step 2:

Identify certain lengths of clips:

Let’s say that your original video that you are trying to clip is 20 minutes long, and you want to create 3 smaller videos (5 Minutes Each)

Create a times.txt files and put:

0-300 
300-600
600-900

Step 3:

Write Python Script:

Now the fun part, writing the code!

from moviepy.video.io.ffmpeg_tools import ffmpeg_extract_subclip

# Replace the filename below.
required_video_file = "filename.mp4"

with open("times.txt") as f:
  times = f.readlines()

times = [x.strip() for x in times] 

for time in times:
  starttime = int(time.split("-")[0])
  endtime = int(time.split("-")[1])
  ffmpeg_extract_subclip(required_video_file, starttime, endtime, targetname=str(times.index(time)+1)+".mp4")

Code Explanation

Line 1: Importing necessary libraries
Line 2: Importing your long video clip
Lines 3-4: Reading times.txt to identify the cutting times
Lines 5: Stripping down the times so python could read it better
Lines 6-8: Cutting the video to the necessary lengths
Line 9: Saving the cut videos with different names

Step 4

Running the program

Run the program with

python split.py

Hope that helped!

If you look at the source code of `ffmpeg_extract_subclip` it is essentially just calling `ffmpeg` executable in a subprocess. There is no need for the whole moviepy library for this task. — Niko Föhr, Apr 24 '22 at 20:49

score 4 · Answer 2 · answered Apr 30 '21 at 13:24

You could have python use the FFmpeg bash command-line tool to manipulate the videos. FFmpeg split the video into 10-minute chunks The python os module can execute command-line commands.

ffmpeg -i source-file.foo -ss 0 -t 600 first-10-min.m4v
ffmpeg -i source-file.foo -ss 600 -t 600 second-10-min.m4v
ffmpeg -i source-file.foo -ss 1200 -t 600 third-10-min.m4v

src: Unix Stack Exchange you could use os.system() like so:

splitLength = 5
for i in range(int(videoLength/splitLength)):
    start =i*60
    length=splitLength*60 
    os.system("ffmpeg -i source-file.foo -ss " + str(start) + " -t " + str(length) + " clip"+str(i)+".m4v")

While moviepy is a handy tool, it is probably an overkill for this, as it is also just calling `ffmpeg` to split the mp4 files. I think this is a cleaner solution. Note to the future readers: The ffmpeg also accepts time stamps in formats like `hh:mm:ss`, `mm:ss` or `mm:ss.milliseconds` to the splits. — Niko Föhr, Apr 24 '22 at 20:44