0

I'm looking an approach how to extract video frames and corresponding audio segments from a video file using python. I know well about opencv. But it allows just to extract video frames. No audio provided. I need both - video frames and audio segments which exactly correspond to each other.

Will be very thankful for any hints :)

Alexey Antonenko
  • 2,389
  • 1
  • 18
  • 18
  • The main goal of a task is to: 1) find time segments of interest analyzing video (a person has appeared/disappeared in a frame); 2) analyze audio within found segments of interest (what the person said). I already implemented person detection and speech recognition. So, the question is only about how to extract video+audio from the video file. – Alexey Antonenko Aug 11 '17 at 09:04

2 Answers2

5

Finally, I found moviepy https://pypi.python.org/pypi/moviepy which implements light wrapper for ffmpeg and provides interface to quickly obtain video and audio frames at the same time positions. You may find example below:

from moviepy.editor import *

video = VideoFileClip('your video filename')
audio = video.audio
duration = video.duration # == audio.duration, presented in seconds, float
#note video.fps != audio.fps
step = 0.1
for t in range(int(duration / step)): # runs through audio/video frames obtaining them by timestamp with step 100 msec
    t = t * step
    if t > audio.duration or t > video.duration: break
    audio_frame = audio.get_frame(t) #numpy array representing mono/stereo values
    video_frame = video.get_frame(t) #numpy array representing RGB/gray frame

Besides extracting a/v frames moviepy provides wide functionality spectrum for audio/video clips modification.

Alexey Antonenko
  • 2,389
  • 1
  • 18
  • 18
1

You are correct that you can not get audio via openCV. You're best bet might be to extract the video frames and audio separately and then manipulate it from there. Some tools which might help include:

ffmpy

ffmpeg (via sub-process)

You can learn more about sub-processing ffmpeg on this related stack overflow answer here: https://stackoverflow.com/a/26741357/7604321

From then you can load in the audio file and process alongside your video frames.

Without much more information from your question I can't suggest much more.

JCooke
  • 950
  • 1
  • 5
  • 17
  • I could but really don't want to use directly ffmpeg or its command line wrapper (ffmpy). This solution looks too complicated while python usually provides simple ways for solving any tasks. That's why I'm looking for so. – Alexey Antonenko Aug 11 '17 at 08:55
  • As a variant I could use opencv to extract frames and additional module to extract separately audio. Then process them using timestamps. But for beginning I'm looking for a single ready-to-use solution (if it exists). – Alexey Antonenko Aug 11 '17 at 09:01
  • Maybe PyMedia? I've never used it though. – JCooke Aug 11 '17 at 12:27
  • PyMedia has last update for *nix in 2006. Seems it's dead. Cannot build it on Ubuntu 16. – Alexey Antonenko Aug 11 '17 at 13:21