I know that currently there are some good posts discussing this topic (this one is excellent and very detailed) but after 2 hours struggling with it I still have some issues:
Just for having some context: I'm obtaining the spectrogram of some wav files (16 kHz, 3 seconds divided in 20ms) and triying to feed them into a neural network in order to find if they contain a concrete word (considering 0 to 1 the certainty range).
def obtain_sample(wav):
sample_rate, samples = wavfile.read(wav)
frequencies, times, spectrogram = signal.spectrogram(samples, sample_rate, nperseg=320, noverlap=16)
dBS = 10 * np.log10(spectrogram) # convert to dB
return dBS
def create_model():
print("Creating Model...")
model= Sequential()
model.add(Dense(10,input_shape=(161,157)))
model.add(Activation('sigmoid'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
com1=obtain_sample("comando.wav")
com2=obtain_sample("comando2.wav")
nocom=obtain_sample("nocomando.wav")
inputs=np.array([com1,com2,nocom])
results=np.array([[1.],[1.],[0.]])
model.fit(inputs,results,epochs=10,)
#model.fit(com1,[1.],epochs=10)
#model.fit(com2,[1.],epochs=10)
#model.fit(nocom,[0.],epochs=10)
model.save("modelo_comando")
print("Model saved")
I'm actually getting the following error:
ValueError('Error when checking target: expected activation_1 to have 3 dimensions, but got array with shape (3, 1)',)
and after almost an hour trying to explain better the problem while inspecting the local var values, I think I'd rather ask if I'm actually giving a correct input shape and how could I use a Flatten/Reshape layer in order to obtain a single value output per sample?
Sorry for not being able to be more concrete