Trying to convert an mp3 file to a Numpy Array, and ffmpeg just hangs

Question

I'm working on a music classification methodology with Scikit-learn, and the first step in that process is converting a music file to a numpy array.

After unsuccessfully trying to call ffmpeg from a python script, I decided to simply pipe the file in directly:

FFMPEG_BIN = "ffmpeg"
cwd = (os.getcwd())
dcwd = (cwd + "/temp")
if not os.path.exists(dcwd): os.makedirs(dcwd)

folder_path = sys.argv[1]
f = open("test.txt","a")

for f in glob.glob(os.path.join(folder_path, "*.mp3")):
    ff = f.replace("./", "/")
    print("Name: " + ff)
    aa = (cwd + ff)

    command = [ FFMPEG_BIN,
        '-i',  aa,
        '-f', 's16le',
        '-acodec', 'pcm_s16le',
        '-ar', '22000', # ouput will have 44100 Hz
        '-ac', '1', # stereo (set to '1' for mono)
        '-']

    pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)
    raw_audio = pipe.proc.stdout.read(88200*4)
    audio_array = numpy.fromstring(raw_audio, dtype="int16")
    print (str(audio_array))
    f.write(audio_array + "\n")

The problem is, when I run the file, it starts ffmpeg and then does nothing:

[mp3 @ 0x1446540] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/home/don/Code/Projects/MC/Music/Spaz.mp3':
  Metadata:
    title           : Spaz
    album           : Seeing souns
    artist          : N*E*R*D
    genre           : Hip-Hop
    encoder         : Audiograbber 1.83.01, LAME dll 3.96, 320 Kbit/s, Joint Stereo, Normal quality
    track           : 5/12
    date            : 2008
  Duration: 00:03:50.58, start: 0.000000, bitrate: 320 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
Output #0, s16le, to 'pipe:':
  Metadata:
    title           : Spaz
    album           : Seeing souns
    artist          : N*E*R*D
    genre           : Hip-Hop
    date            : 2008
    track           : 5/12
    encoder         : Lavf56.4.101
    Stream #0:0: Audio: pcm_s16le, 22000 Hz, mono, s16, 352 kb/s
    Metadata:
      encoder         : Lavc56.1.100 pcm_s16le
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help

It just sits there, hanging, for far longer than the song is. What am I doing wrong here?,

Here: https://zulko.github.io/blog/2013/10/04/read-and-write-audio-files-in-python-using-ffmpeg/ — Rich, Jul 04 '16 at 22:26

Dalen · Accepted Answer · 2016-07-04T23:18:50.613

3

I recommend you pymedia or audioread or decoder.py. There are also pyffmpeg and similar modules for doing just that what you want. Take a look at pypi.python.org.

Of course, these will not help you turn the data into numpy array.

Anyway, this is how it is done crudely using piping to ffmpeg:

from subprocess import Popen, PIPE
import numpy as np

def decode (fname):
    # If you are on Windows use full path to ffmpeg.exe
    cmd = ["./ffmpeg.exe", "-i", fname, "-f", "wav", "-"]
    # If you are on W add argument creationflags=0x8000000 to prevent another console window jumping out
    p = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE)
    data = p.communicate()[0]
    return np.fromstring(data[data.find("data")+4:], np.int16)

This is how it should work for basic use.

It should work because output of ffmpeg is by default 16 bit audio. But if you mess around, you should know that numpy doesn't have int24, so you will be forced to do some bit manipulations and represent 24 bit audio as 32 bit audio. Just, don't use 24 bit, and the world is happy. :D

We may discuss refining the code in comments, if you need something more sophisticated.

edited Jul 04 '16 at 23:18

answered Jul 04 '16 at 22:35

Dalen

4,128
1
17
35

That seems to be what I need, but it gave me an `EOFError` in python 2.7, and an `ImportError` in python 3. – Rich Jul 04 '16 at 22:54
Am I missing a requirement? – Rich Jul 04 '16 at 22:54
No, except numpy, all is standard lib. I'll make a check now. – Dalen Jul 04 '16 at 22:57
OK, this code is checked and it works. I jumped over all that should be done properly to have nice code etc. because memory usage can really grow when flitting data around. I have no idea why ImportError should occur, perhaps some naming change in Python3. This works on Win, and if you want I can check it on Linux later. – Dalen Jul 04 '16 at 23:23
And I now have a 40.7 MB txt file. Thanks, you really helped me. – Rich Jul 04 '16 at 23:47
Wait, the txt file is full of gibberish. I think I broke it somehow. – Rich Jul 05 '16 at 00:03
What *should* I have gotten out? – Rich Jul 05 '16 at 00:03
Out of decode() function, you get a numpy.ndarray() instance with dtype numpy.int16. How did you went about saving it to txt? – Dalen Jul 05 '16 at 00:35
I use `f = open("test.wav","a")` and `f.write(a)`, but when I open it in gedit it's just "/FF/FF/FF/FF/00/00/00/00", and in nano it's "��^@^@^@^@^@^@^@^@^@^@^@" – Rich Jul 05 '16 at 00:55
If you want to output the array into a file in human readable way use: np.savetxt(".txt", decode(".mp3")). You'll get a comma separated ints. Your file will be way over 40 MB. :D – Dalen Jul 05 '16 at 00:58
What you are doing now is just saving binary raw data back to a file. You said you need an array, why do you want to save it? – Dalen Jul 05 '16 at 01:07
Ok, I used `np.savetxt("array.txt", a)`. It's at 500 MB and still growing. :( – Rich Jul 05 '16 at 01:21
Ok, so: It stopped at 518.6 MB, but when I tried to open it my computer froze, I had to reboot. – Rich Jul 05 '16 at 01:33
The second time, it worked. It's full of this: `4.832000000000000000e+03, 1.736800000000000000e+04, 3.851000000000000000e+03, 1.755400000000000000e+04, 3.134000000000000000e+03`. I guess that's what I wanted? – Rich Jul 05 '16 at 01:35
OK, yes, but change the output format of np.savetxt so that it outputs clearly what you want. Read help(np.savetxt) to learn how to use formating in fmt argument. Default is float with *10**(something). Sorry, I forgot to mention. I think that fmt="%i" should be enough, but read the help anyway. Nano will not freeze, but you'll have to wait a bit. – Dalen Jul 05 '16 at 01:50
Update: now it's only 116 MB, and full of: "`-373, 12658, 939, 16178, -797, 14072, -1943, 12372`" Thanks, you really helped me out, a lot. – Rich Jul 05 '16 at 03:40
That is OK. If you want to scale the signal to some interval for easier classification perform normalization on the array before saving it. Have in mind that when your audio is stereo, all integers in the array on even position index represent left channel, and all odd the right. You may have to separate them, depending on what classification you would like to employ. You can easily reshape the array to be 2D with one column representing left, and other the right channel. You may even have to turn all your signals to mono. – Dalen Jul 05 '16 at 17:08
Actually, I edited the command so it it mono: `cmd = ["ffmpeg", "-i", fname, "-ss", "0", "-t", "120", "-ac", "1", "-ar", "22000", "-f", "wav", "-"]` – Rich Jul 05 '16 at 17:10
Bravo! Although channel separation and turning to mono in numpy is piece of cake it is better that way, because ffmpeg will also deal with mono compatibility by correctly applying M and S components. – Dalen Jul 05 '16 at 17:21

score 2 · Answer 2 · edited Jun 26 '18 at 07:00

Here's what I'm using: It uses pydub (which uses ffmpeg) and scipy.

Full setup (on Mac, may differ on other systems):

pip install scipy
pip install pydub
brew install ffmpeg  # Or probably "sudo apt-get install ffmpeg on linux"

Then to read the mp3:

import tempfile
import os
import pydub
import scipy
import scipy.io.wavfile


def read_mp3(file_path, as_float = False):
    """
    Read an MP3 File into numpy data.
    :param file_path: String path to a file
    :param as_float: Cast data to float and normalize to [-1, 1]
    :return: Tuple(rate, data), where
        rate is an integer indicating samples/s
        data is an ndarray(n_samples, 2)[int16] if as_float = False
            otherwise ndarray(n_samples, 2)[float] in range [-1, 1]
    """

    path, ext = os.path.splitext(file_path)
    assert ext=='.mp3'
    mp3 = pydub.AudioSegment.from_mp3(file_path)
    _, path = tempfile.mkstemp()
    mp3.export(path, format="wav")
    rate, data = scipy.io.wavfile.read(path)
    os.remove(path)
    if as_float:
        data = data/(2**15)
    return rate, data

Credit to James Thompson's blog

You need `os.close(_)` (and probably rename `_` to `fd`) to close the temp file descriptor. Otherwise, when run in a for loop you will eventually get `[Errno 24] Too many open files`. — Matthew D. Scholefield, Aug 07 '18 at 21:51

Trying to convert an mp3 file to a Numpy Array, and ffmpeg just hangs

2 Answers2

Linked