1

I have an audio file, Sample.flac. The title and length can be read with ffprobe to result in the output being sent to STDERR.

I want to run ffprobe through subprocess, and have done so successfully. I then retrieve the output (piped to subprocess.PIPE) with *.communicate()[1].decode() as indicated that I should by the Python docs.

communicate() returns a tuple, (stdout, stderr), with the output from the Popen() object. The proper index for stderr is then accessed and decoded from a byte string into a Python 3 UTF-8 string.

This decoded output is then parsed with a multiline regex pattern matching the format of the ffprobe metadata output. The match groups are then placed appropriately into a dictionary, with each first group converted to lowercase, and used as the key for the second group (value).

Here is an example of the output and the working regex.

The data can be accessed through the dictionary keys as expected. But upon concatenating the values together (all are strings), the output appears mangled.

This is the output I would expect:

Believer (Kaskade Remix) 190

Instead, this is what I get:

 190ever (Kaskade Remix)

I don't understand why the strings appear to "overlap" each other and result in a mangled form. Can anyone explain this and what I have done wrong?

Below is the complete code that was run to produce the results above. It is a reduced section of my full project.

#! /usr/bin/env python3
# -*- coding: utf-8 -*-

import os

from re import findall, MULTILINE
from subprocess import Popen, PIPE


def media_metadata(file_path):
    """Use FFPROBE to get information about a media file."""
    stderr = Popen(("ffprobe", file_path), shell=True, stderr=PIPE).communicate()[1].decode()

    metadata = {}

    for match in findall(r"(\w+)\s+:\s(.+)$", stderr, MULTILINE):
        metadata[match[0].lower()] = match[1]

    return metadata


if __name__ == "__main__":
    meta = media_metadata("C:/Users/spike/Music/Sample.flac")
    print(meta["title"], meta["length"])
    # The above and below have the same result in the console
    # print(meta["title"] + " " + meta["length"])
    # print("{title} {length}".format(meta))

Can anyone explain this unpredictable output?

I have asked this question here earlier, however I dont think it was very clear. In the raw output when this is run on multiple files, you can see that towards the end the strings start becoming as unpredictable as not even printing part of the title value at all.

Thanks.

Jacob Birkett
  • 1,927
  • 3
  • 24
  • 49

2 Answers2

2

You are catching up the "\r" symbol. At printing, cursor is returned to the beginning of the string, so the next print and overwrites the first part. Stripping whitespaces (will also remove trailing "\r") should solve the problem:

metadata[match[0].lower()] = match[1].strip()
Marat
  • 15,215
  • 2
  • 39
  • 48
  • But why would this affect concatenation as shown in the second commented print statement? – Jacob Birkett Feb 08 '18 at 04:04
  • both look the same to me (`print(meta["title"], meta["length"])`). What is the difference? – Marat Feb 08 '18 at 13:54
  • One uses explicit concatenation, and the other I assume internally uses `" ".join(args)`? – Jacob Birkett Feb 09 '18 at 00:16
  • I see two exactly identical expressions. I might be looking at a wrong expression(s), so can you give a specific example? – Marat Feb 09 '18 at 04:25
  • Ahah, I didn't catch that. I fixed it and added another example, even though this has been solved. Thanks. – Jacob Birkett Feb 09 '18 at 04:45
  • In both cases string passed to `print()` will be the same: `"\r <length>\r"`. I do not see why these statement should generate different result.</length> – Marat Feb 09 '18 at 04:52
1

Reproduce:

print('Believer (Kaskade Remix)\r 190')

Output:

 190ever (Kaskade Remix)

Issue:

End-Of-Line is \r\n. re $ matches \n. \r remains in the matching group.

Fix:

Insert \r before $ in your re pattern. i.e. (\w+)\s+:\s(.+)\r$

Or use universal_newlines=True as a Popen argument and remove .decode() as the output will be text with \n instead of \r\n.

Or stderr = stderr.replace('\r', '') before re processing.

Alternative:

ffprobe can output a json string. Use json module which loads the string and returns a dictionary.

i.e. command

['ffprobe', '-show_format', '-of', 'json', file_path]

The json string will be the stdout stream.

michael_heath
  • 5,262
  • 2
  • 12
  • 22