Load FLAC file in python same as scipy or librosa

Question

I would like to feed some flac sound files into a keras model. With wavfiles I can do (contrived example with one audio file used twice)

import scipy.io.wavfile
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD

path = 'path/to/file.wav'
_, audio = scipy.io.wavfile.read(path)
dataset = [audio, audio]
x_train = np.array(dataset)
y_train = keras.utils.to_categorical([0, 1], num_classes=2)

model = Sequential()
model.add(Dense(32, activation='relu', input_shape=x_train[0].shape))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=32)

How do I do this with flac files instead?

Just decode to wav (either using python; or externally; e.g. ffmpeg or the official decoder). — sascha, Jun 11 '18 at 18:46
I'd prefer a programmatic solution, an explanation of what the librosa and scipy functions actually load and how to get a flac file to match that format. It's some numpy array? — Harry Moreno, Jun 11 '18 at 19:11
You will get a match by decoding to wav, followed by reading through scipy (where the docs will give you constraints on the kind of wav-files). To be honest: except for toy-tasks, you eventually will need a more evolved pipeline doing this decoding once for later learning. As this data will be huge, you probably want some hdf5-based storage (or at least numpy's mmap). Additionally: from a ML-perspective, a raw wav-file based input will probably not help in your ML-tasks. That's what librosa is for (feature extraction) — sascha, Jun 11 '18 at 19:15
How does wavenet fit into your last statement, afaik wavenet does not use mfcc? — Harry Moreno, Jun 11 '18 at 19:34

score 12 · Accepted Answer · answered Jun 13 '18 at 21:29

The soundfile package can load flac files in a numpy array compatible format

import numpy as np                                                             
import soundfile as sf                                                      
import keras                                                                
from keras.models import Sequential                                         
from keras.layers import Dense, Dropout, Activation                         
from keras.optimizers import SGD                                            

path = 'path/to/file.flac'                                                  
data, samplerate = sf.read(path)                                            
dataset = [data, data]                                                      
x_train = np.array(dataset)                                                 
y_train = keras.utils.to_categorical([0, 1], num_classes=2)                 

model = Sequential()                                                        
model.add(Dense(32, activation='relu', input_shape=x_train[0].shape))       
model.add(Dense(2, activation='softmax'))                                   
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=32)

forkable sscce https://www.kaggle.com/morenoh149/flac-keras-hello-world

the soundfile has many bugs it doesn't work any more, same as librosa because it based on soundfile, is there another way? — Walid Bousseta, Mar 25 '20 at 09:32
I don't know of a simpler way. If someone more knowledgable of numpy arrays could chime in that would be great. — Harry Moreno, Nov 02 '20 at 17:56

Load FLAC file in python same as scipy or librosa

1 Answers1