2

I know that currently there are some good posts discussing this topic (this one is excellent and very detailed) but after 2 hours struggling with it I still have some issues:

Just for having some context: I'm obtaining the spectrogram of some wav files (16 kHz, 3 seconds divided in 20ms) and triying to feed them into a neural network in order to find if they contain a concrete word (considering 0 to 1 the certainty range).

def obtain_sample(wav):
    sample_rate, samples = wavfile.read(wav)
    frequencies, times, spectrogram = signal.spectrogram(samples, sample_rate, nperseg=320, noverlap=16)
    dBS = 10 * np.log10(spectrogram)  # convert to dB

    return dBS

def create_model():
    print("Creating Model...")
    model= Sequential()
    model.add(Dense(10,input_shape=(161,157)))
    model.add(Activation('sigmoid'))

    model.compile(optimizer='rmsprop',
                  loss='binary_crossentropy',
                  metrics=['accuracy'])

    com1=obtain_sample("comando.wav")
    com2=obtain_sample("comando2.wav")
    nocom=obtain_sample("nocomando.wav")
    inputs=np.array([com1,com2,nocom])
    results=np.array([[1.],[1.],[0.]])
    model.fit(inputs,results,epochs=10,)
    #model.fit(com1,[1.],epochs=10)
    #model.fit(com2,[1.],epochs=10)
    #model.fit(nocom,[0.],epochs=10)

    model.save("modelo_comando")
    print("Model saved")

I'm actually getting the following error:

ValueError('Error when checking target: expected activation_1 to have 3 dimensions, but got array with shape (3, 1)',)

and after almost an hour trying to explain better the problem while inspecting the local var values, I think I'd rather ask if I'm actually giving a correct input shape and how could I use a Flatten/Reshape layer in order to obtain a single value output per sample?

Sorry for not being able to be more concrete

Julen
  • 97
  • 2
  • 9

1 Answers1

0

Add a Flatten layer after the Dense and after the Flatten layer add a Dense layer where the number of units should be equal to the shape of the output you expect. In this case we expect a single value. hence Dense(1)

inputs = np.random.rand(3,161,157)
model= Sequential()
model.add(Dense(10,input_shape=(161,157)))
model.add(Flatten())
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])
model.summary()
results=np.array([[1.],[1.],[0.]])
model.fit(inputs,results,epochs=10)

I ran the above code, without any issues. Please check this one

When predicting on the model

# Since i don't have the original data, i am creating some random values
test = np.random.rand(161,157)
test = np.expand_dims(test,axis=0)
model.predict(test)
Hari Krishnan
  • 2,049
  • 2
  • 18
  • 29
  • Indeed, there's no issue when running that code. But seems to take the three samples as only one. If I try to predict another sample (just passing the first one as argument), it asks for a third dimension. – Julen Aug 14 '18 at 12:11
  • use np.expand_dims on the data, before using it to predict the output. I've added the code along with the answer – Hari Krishnan Aug 14 '18 at 12:21
  • The code works and in terms of compilation and excution it's correct. After running it, the output is does not match with the desired result. So probably, as you commented before, there is a design problem here. I'll try to see which model fits better with the input data (starting with LSTM, thx). – Julen Aug 16 '18 at 07:09