2

I'm currently trying to find python libraries that can assist me in extracting metadata or information from video files such as [mp4, Mkv, Avi, WebM, mpg] formats for example.

The main data that I'm focusing on extracting from the video files are mostly the [Title, Description, Comment, Captions/Subtitles].

I've tried using FFmpeg-python following this guide: https://www.thepythoncode.com/article/extract-media-metadata-in-python

and Tinytag, https://www.geeksforgeeks.org/access-metadata-of-various-audio-and-video-file-formats-using-python-tinytag-library/

From my understanding, FFmpeg-python provided the most data from the probe() function but the output does not contain [Title, Description, Comment] and closed_captions is simply '0' which I assume is the source track.

Thank you for any help provided.

SingularitySG
  • 41
  • 1
  • 3

1 Answers1

1

You can use ffprobe to get the metadata:

import subprocess as sp
import json
import pprint

out = sp.run(['ffprobe','-of','json','-show_entries', 'format:stream', videofile],\
             stdout=sp.PIPE, stderr=sp.PIPE, universal_newlines=True)
results = json.loads(out.stdout)
metadata_format = results['format']['tags']
metadata_streams = [res['tags'] for res in results['streams']]

pprint(metadata_format) # "main" metadata: Title & Description found here
pprint(metadata_streams) # per-stream metadata

For the substitles/closed-captions, you need to read the subtitle streams with ffmpeg:

# get subtitle in webvtt format
out = sp.run(['ffmpeg','-i',videofile, '-map', 's:0', '-f','webvtt','-'],\
             stdout=sp.PIPE, stderr=sp.PIPE, universal_newlines=True)
subtitle = out.stdout

Then you can use a library like webvtt-py to parse the subtitle data. (I don't have firsthand experience, so try it yourself.)

One caveat though. If your video is a DVD rip, then its subtitle streams (dvd_subtitle) are bitmaps and not text, and FFmpeg cannot convert it to a text data.

kesh
  • 4,515
  • 2
  • 12
  • 20
  • Thank you for the fast reply. I'll try out ffprobe and see how it goes! – SingularitySG Mar 17 '22 at 09:21
  • Hi, I've tried it out and it does work, and got what I'm looking for. Just a bit confused regarding how the subprocess.run() function. From what I understand, it's calling the ffprobe from FFmpeg, and taking in the options of [-of:json -> set the output printing format -show_entries, format:stream -> Set list of entries to show] While the subtitle sp.run is running ffmpeg with the options of [-i input_url -> Read input_url. -f format -> Force format to use. -map -> choose which streams from the input(s) should be included in the output 's:0' -> select subtitle stream index #0] – SingularitySG Mar 17 '22 at 13:33
  • Your on on the money with your interpretation. What's the confusion? – kesh Mar 17 '22 at 13:43
  • Nothing else then, Thanks for the help! – SingularitySG Mar 19 '22 at 10:29
  • Hi, I tried manually adding some texts into the mp4 videos subtitles tag under their properties. I tried pprint(subtitle) but I see blanks instead. Was it supposed to pop out a FFmpeg program or write out a subtitle file? Thank you. – SingularitySG Mar 21 '22 at 05:46