7

I would like to feed some flac sound files into a keras model. With wavfiles I can do (contrived example with one audio file used twice)

import scipy.io.wavfile
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD

path = 'path/to/file.wav'
_, audio = scipy.io.wavfile.read(path)
dataset = [audio, audio]
x_train = np.array(dataset)
y_train = keras.utils.to_categorical([0, 1], num_classes=2)

model = Sequential()
model.add(Dense(32, activation='relu', input_shape=x_train[0].shape))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=32)

How do I do this with flac files instead?

Harry Moreno
  • 10,231
  • 7
  • 64
  • 116
  • Just decode to wav (either using python; or externally; e.g. ffmpeg or the official decoder). – sascha Jun 11 '18 at 18:46
  • I'd prefer a programmatic solution, an explanation of what the librosa and scipy functions actually load and how to get a flac file to match that format. It's some numpy array? – Harry Moreno Jun 11 '18 at 19:11
  • You will get a match by decoding to wav, followed by reading through scipy (where the docs will give you constraints on the kind of wav-files). To be honest: except for toy-tasks, you eventually will need a more evolved pipeline doing this decoding once for later learning. As this data will be huge, you probably want some hdf5-based storage (or at least numpy's mmap). Additionally: from a ML-perspective, a raw wav-file based input will probably not help in your ML-tasks. That's what librosa is for (feature extraction) – sascha Jun 11 '18 at 19:15
  • How does wavenet fit into your last statement, afaik wavenet does not use mfcc? – Harry Moreno Jun 11 '18 at 19:34

1 Answers1

12

The soundfile package can load flac files in a numpy array compatible format

import numpy as np                                                             
import soundfile as sf                                                      
import keras                                                                
from keras.models import Sequential                                         
from keras.layers import Dense, Dropout, Activation                         
from keras.optimizers import SGD                                            

path = 'path/to/file.flac'                                                  
data, samplerate = sf.read(path)                                            
dataset = [data, data]                                                      
x_train = np.array(dataset)                                                 
y_train = keras.utils.to_categorical([0, 1], num_classes=2)                 

model = Sequential()                                                        
model.add(Dense(32, activation='relu', input_shape=x_train[0].shape))       
model.add(Dense(2, activation='softmax'))                                   
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=32)    

forkable sscce https://www.kaggle.com/morenoh149/flac-keras-hello-world

Harry Moreno
  • 10,231
  • 7
  • 64
  • 116