1

I was just getting started with a code to pre-process some audio data in order to lately feed a neural network with it. Before explaining more deeply my actual problem, mention that I took the reference for how to do the project from this site. Also used some code taken from this post and read for more info in the signal.spectogram doc and this post.

For now with all of the sources mentioned before, I managed to get the wav audio file as a numpy array and plot both its amplitude and spectrogram. Theese represent a recording of me saying the word "command" in Spanish.

The strange fact here is that I search on the internet and found that human voice spectrum moves between 80 and 8k Hz, so just to get sure I compared this output with the one Audacity spectrogram returned. As you can see, this seems to be more coherent with the info found, as the frequency range is the one supposed to be for humans.

So that takes me to final question: Am I doing something wrong in the process of reading the audio or generating the spectrogram or maybe am I having plot issues?

By the way I'm new to both python and signal processing so thx in advance for your patience.

Here is the code I'm actually using:

def espectrograma(wav):
    sample_rate, samples = wavfile.read(wav)
    frequencies, times, spectrogram = signal.spectrogram(samples, sample_rate, nperseg=320, noverlap=16, scaling='density')
    #dBS = 10 * np.log10(spectrogram)  # convert to dB

    plt.subplot(2,1,1)
    plt.plot(samples[0:3100])

    plt.subplot(2,1,2)
    plt.pcolormesh(times, frequencies, spectrogram)
    plt.imshow(spectrogram,aspect='auto',origin='lower',cmap='rainbow')
    plt.ylim(0,30)
    plt.ylabel('Frecuencia [kHz]')
    plt.xlabel('Fragmento[20ms]')
    plt.colorbar()
    plt.show()
Julen
  • 97
  • 2
  • 9

2 Answers2

1

The computation of the spectrogram seems fine to me. If you plot the spectrogram in log scale you should observe something more similar to the audition plots you referenced. So uncomment your line

#dBS = 10 * np.log10(spectrogram) # convert to dB

and then use the variable dBS for the plotting instead of spectrogram in

plt.pcolormesh(times, frequencies, spectrogram) plt.imshow(spectrogram,aspect='auto',origin='lower',cmap='rainbow')

Maja
  • 51
  • 1
  • That's the weird thing. It doesn't seems to change anything converting it to dBs. In fact, attending to the colorbar output, the power in some frequencies seems to reach the 40k dBs, which is ABSURD. Something like those numbers/1000 should be more reasonable, but again, I do not understand where those numbers are getting from. Thx btw. – Julen Aug 07 '18 at 09:42
  • Hm.. weird, because I tried your code on a sound file that I have, and it worked well when I used the log scale...Regarding your second point about the large values : try to load your wavfile using the soundfile library, instead of scipy.io.wavread - that solved the problem for me. `import soundfile as sf` `samples, sample_rate = sf.read(pathToFile')` – Maja Aug 07 '18 at 14:13
  • Ah, and also remove the line `plt.imshow(spectrogram,aspect='auto',origin='lower',cmap='rainbow')` the `plt.colormesh` already does the job. Then you should really see the frequency axis going up to 8000 Hz if your sample rate is 16k, and your plot should be good. I see no other problem with your code. – Maja Aug 07 '18 at 14:28
  • Trying what you suggested, using sf.readfile it seems that something changed, as now dBs look different. Still, they have weird values, as they are all negative. Frequencies did not changed... which is strange, because I reviewed the values in local vars and the frequencies array max val is 8000 https://i.imgur.com/zkruQYh.png – Julen Aug 08 '18 at 09:22
  • On the other hand, I found that using colormesh instead of imshow, I get the spectogram with the ms in the x axis instead of the windows, which is a little less useful for my future purpossal. https://i.imgur.com/jbVUvpP.png – Julen Aug 08 '18 at 09:22
  • I think that I found why is plotting those axis values (161x157). It seems to be the shape of both spectrogram and dBS vars, so it IS a printing problem more than an implementation problem. The only thing left now is to find the correct printing values in order to obtain the desired plot – Julen Aug 08 '18 at 09:43
  • By the way, I found you actually were right. Printing it with colormesh returns de correct Frequency range in time x axis. The problem to see this was the plt.ylim stablished when i was only using the imshow. By the way, do not know why using imshow changes the scale, which is a pitty, cause i'd rather to divide it taking into account the windows instead of time. – Julen Aug 08 '18 at 10:56
0

The spectrogram uses a fourier transform to convert your timeseries data into frequency domain.

The maximum frequency that can be measured is (sampling frequency) / 2, so in this case it may seem like your sampling frequency is 60KHz?

Anyway, regarding your question. It may be correct that the human voice spectrum lies within this range, but the fourier transform is never perfect. I would simply adjust your Y-Axis to specifically look at these frequencies.

It seems to me that you are calculating your spectrogram correctly, at least as long as you are reading the sample_rate and samples correctly..

VegardKT
  • 1,226
  • 10
  • 21
  • First of all, thx for the quick answer! (not expecting to be so fast). If you look carefully at the left of the Audacity spectrogram screenshot you can see that it tells that the sampling frequency is 16k. Anyway just in case I confirmed this with a breakpoint in the code and the sample_rate it is indeed 16k, so I don't think that's the actual problem. – Julen Aug 07 '18 at 09:10
  • I see. IIf that is the case you should not be able to get any information about frequencies over 8KHz due to the nyquist frequency – VegardKT Aug 07 '18 at 10:22