I have an audio file, Sample.flac
. The title and length can be read with ffprobe
to result in the output being sent to STDERR.
I want to run ffprobe
through subprocess, and have done so successfully. I then retrieve the output (piped to subprocess.PIPE
) with *.communicate()[1].decode()
as indicated that I should by the Python docs.
communicate()
returns a tuple, (stdout, stderr)
, with the output from the Popen()
object. The proper index for stderr
is then accessed and decoded from a byte string into a Python 3 UTF-8 string.
This decoded output is then parsed with a multiline regex pattern matching the format of the ffprobe
metadata output. The match groups are then placed appropriately into a dictionary, with each first group converted to lowercase, and used as the key for the second group (value).
Here is an example of the output and the working regex.
The data can be accessed through the dictionary keys as expected. But upon concatenating the values together (all are strings), the output appears mangled.
This is the output I would expect:
Believer (Kaskade Remix) 190
Instead, this is what I get:
190ever (Kaskade Remix)
I don't understand why the strings appear to "overlap" each other and result in a mangled form. Can anyone explain this and what I have done wrong?
Below is the complete code that was run to produce the results above. It is a reduced section of my full project.
#! /usr/bin/env python3
# -*- coding: utf-8 -*-
import os
from re import findall, MULTILINE
from subprocess import Popen, PIPE
def media_metadata(file_path):
"""Use FFPROBE to get information about a media file."""
stderr = Popen(("ffprobe", file_path), shell=True, stderr=PIPE).communicate()[1].decode()
metadata = {}
for match in findall(r"(\w+)\s+:\s(.+)$", stderr, MULTILINE):
metadata[match[0].lower()] = match[1]
return metadata
if __name__ == "__main__":
meta = media_metadata("C:/Users/spike/Music/Sample.flac")
print(meta["title"], meta["length"])
# The above and below have the same result in the console
# print(meta["title"] + " " + meta["length"])
# print("{title} {length}".format(meta))
Can anyone explain this unpredictable output?
I have asked this question here earlier, however I dont think it was very clear. In the raw output when this is run on multiple files, you can see that towards the end the strings start becoming as unpredictable as not even printing part of the title
value at all.
Thanks.