I'm trying to extract audio and visual information from a video. As we known, the visual and audio information must be paired. Thus, I check the information from OpenCV (visual part) and librosa (audio part). However, the total duration is not the same.
import cv2
import librosa
print(cv2.__version__) ## 3.4.1
vid_path = '001167.mp4'
audio, audio_rate = librosa.load(vid_path, sr=16000, mono=False)
vidcap = cv2.VideoCapture(vid_path)
vidcap.set(cv2.CAP_PROP_POS_AVI_RATIO,1)
video_length = vidcap.get(cv2.CAP_PROP_POS_MSEC)
audio_length = librosa.get_duration(y=audio,sr=audio_rate)
print(audio_length,video_length/1000)
Result: Audio: 10.005 sec
, Video: 9.0924 sec
The audio duration is longer.