I'm working on some .mp4 files with Python. I'm using wave
, math
, contextlib
, speech_recognition
and AudioFileClip
libraries. I have very long files (video+audio). I would like to make Python cut the files in 5-minutes new files (still in .mp4) and then make Python transcribe each of them. Until now, I was able to write the following code to transcribe the initial (long) file:
import wave, math, contextlib
import speech_recognition as sr
from moviepy.editor import AudioFileClip
import os
os.chdir(" ... my path ...") # e.g. C:/Users/User/Desktop
FILE = "file_name" # e.g. video1 (without extension)
transcribed_audio_file_name = FILE + "_transcribed_speech.wav"
mp4_video_file_name = FILE + ".mp4"
audioclip = AudioFileClip(mp4_video_file_name)
audioclip.write_audiofile(transcribed_audio_file_name)
with contextlib.closing(wave.open(transcribed_audio_file_name,'r')) as f:
frames = f.getnframes()
rate = f.getframerate()
duration = frames / float(rate)
total_duration = math.ceil(duration / 60)
r = sr.Recognizer()
for i in range(0, total_duration):
with sr.AudioFile(transcribed_audio_file_name) as source:
audio = r.record(source, offset=i*60, duration=60)
f = open(FILE+"_transcription.py", "a")
f.write(r.recognize_google(audio, language="en-US"))
f.write(" ")
print(r.recognize_google(audio, language="en-US"))
f.close()
print("Transcription DONE.")
How can I add a part in which I take the file "video", cut into pieces of 5 minutes each, save them as .mp4 in my folder, process (and transcribe) each piece one-by-one? Thank you in advance!