FFT of data received from PyAudio gives wrong frequency

Question

My main task is to recognize a human humming from a microphone in real time. As the first step to recognizing signals in general, I have made a 5 seconds recording of a 440 Hz signal generated from an app on my phone and tried to detect the same frequency.

I used Audacity to plot and verify the spectrum from the same 440Hz wav file and I got this, which shows that 440Hz is indeed the dominant frequency : (https://i.stack.imgur.com/c3DWD.png)

To do this with python, I use the PyAudio library and refer this blog. The code I have so far which I run with the wav file is this :

"""PyAudio Example: Play a WAVE file."""

import pyaudio
import wave
import sys
import struct
import numpy as np
import matplotlib.pyplot as plt

CHUNK = 1024

if len(sys.argv) < 2:
    print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
    sys.exit(-1)

wf = wave.open(sys.argv[1], 'rb')

p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                channels=wf.getnchannels(),
                rate=wf.getframerate(),
                output=True)

data = wf.readframes(CHUNK)

i = 0
while data != '':
    i += 1
    data_unpacked = struct.unpack('{n}h'.format(n= len(data)/2 ), data) 
    data_np = np.array(data_unpacked) 
    data_fft = np.fft.fft(data_np)
    data_freq = np.abs(data_fft)/len(data_fft) # Dividing by length to normalize the amplitude as per https://www.mathworks.com/matlabcentral/answers/162846-amplitude-of-signal-after-fft-operation
    print("Chunk: {} max_freq: {}".format(i,np.argmax(data_freq)))

    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    ax.plot(data_freq)
    ax.set_xscale('log')
    plt.show()

    stream.write(data)
    data = wf.readframes(CHUNK)

stream.stop_stream()
stream.close()

p.terminate()

In the output, I get that the max frequency is 10 for all the chunks and an example of one of the plots is : (https://i.stack.imgur.com/2e3wR.png)

I had expected this value to be 440 instead of 10 for all the chunks. I admit I know very little about the theory of FFTs and I appreciate any help in letting my solve this.

EDIT: The sampling rate is 44100. no. of channels is 2 and sample width is also 2.

10 is just the index of the corresponding frequency bin. What Hz this actually refers to depends on the size (length) and the sampling rate of the data. If you provide this information we might be able to tell you whether something is wrong. Aside from that, I would recommend that you check out [librosa](https://librosa.github.io/librosa/index.html) for integrated handling of audio, signal processing and more intuitive plots. — xdurch0, Feb 08 '19 at 12:19
What is your sample frequency? Is the resolution about 44.1kHz? You should use this number to create the rightful frequency axis and make a correct reading. If you know how many seconds `dt` the sample lasts (for `N` observations), then you can assess its frequency: `len(x)/dt`. — jlandercy, Feb 08 '19 at 12:35
The sampling rate is 44100, I have added this information in the description now. @jlandercy : The sample lasts 5 seconds, could you please tell me what you mean be "observations"? I assume "x" is the FFT result then? @xdurch0 : I will check out librosa thank you. Would I be right in assuming that the correct frequency is now `RATE/CHUNK*INDEX` where index is the "10" I get here (ie argmax)?
Thank you both for your comments. — Tejas Kumar, Feb 08 '19 at 13:00

jlandercy · Accepted Answer · 2019-02-19T08:36:33.100

Forewords

As xdurch0 pointed out, you are reading a kind of index instead of a frequency. If you are about to make all computation by yourself you need to compute you own frequency vector before plotting if you want to get consistent result. Reading this answer may help you towards the solution.

The frequency vector for FFT (half plane) is:

 f = np.linspace(0, rate/2, N_fft/2)

Or (full plane):

 f = np.linspace(-rate/2, rate/2, N_fft)

On the other hand we can delegate most of the work to the excellent scipy.signal toolbox which aims to cope with this kind of problems (and many more).

MCVE

Using scipy package it is straight forward to get the desired result for a simple WAV file with a single frequency (source):

import numpy as np
from scipy import signal
from scipy.io import wavfile
import matplotlib.pyplot as plt

# Read the file (rate and data):
rate, data = wavfile.read('tone.wav') # See source

# Compute PSD:
f, P = signal.periodogram(data, rate) # Frequencies and PSD

# Display PSD:
fig, axe = plt.subplots()
axe.semilogy(f, P)
axe.set_xlim([0,500])
axe.set_ylim([1e-8, 1e10])
axe.set_xlabel(r'Frequency, $\nu$ $[\mathrm{Hz}]$')
axe.set_ylabel(r'PSD, $P$ $[\mathrm{AU^2Hz}^{-1}]$')
axe.set_title('Periodogram')
axe.grid(which='both')

Basically:

Read the wav file and get the sample rate (here 44.1kHz);
Compute the Power Spectrum Density and frequencies;
Then display it with matplotlib.

This outputs:

Find Peak

Then we can find the frequency of the first highest peak (P>1e-2, this criterion is subject to tuning) using find_peaks:

idx = signal.find_peaks(P, height=1e-2)[0][0]
f[idx] # 440.0 Hz

Putting all together it merely boils down to:

def freq(filename, setup={'height': 1e-2}):
    rate, data = wavfile.read(filename)
    f, P = signal.periodogram(data, rate)
    return f[signal.find_peaks(P, **setup)[0][0]]

Handling multiple channels

I tried this code with my wav file, and got the error for the line axe.semilogy(f, Pxx_den) as follows : ValueError: x and y must have same first dimension. I checked the shapes and f has (2,) while Pxx_den has (220160,2). Also, the Pxx_den array seems to have all zeros only.

Wav file can hold multiple channels, mainly there are mono or stereo files (max. 2**16 - 1 channels). The problem you underlined occurs because of multiple channels file (stereo sample).

rate, data = wavfile.read('aaaah.wav') # Shape: (46447, 2), Rate: 48 kHz

It is not well documented, but the method signal.periodogram also performs on matrix and its input is not directly consistent with wavfile.read output (they perform on different axis by default). So we need to carefully orient dimensions (using axis switch) when performing PSD:

f, P = signal.periodogram(data, rate, axis=0, detrend='linear')

It also works with Transposition data.T but then we need to back transpose the result.

Specifying the axis solve the issue: frequency vector is correct and PSD is not null everywhere (before it performed on the axis=1 which is of length 2, in your case it performed 220160 PSD on 2-samples signals we wanted the converse).

The detrend switch ensure the signal has zero mean and its linear trend is removed.

Real application

This approach should work for real chunked samples, provided chunks hold enough data (see Nyquist-Shannon sampling theorem). Then data are sub-samples of the signal (chunks) and rate is kept constant since it does not change during the process.

Having chunks of size 2**10 seems to work, we can identify specific frequencies from them:

f, P = signal.periodogram(data[:2**10,:], rate, axis=0, detrend='linear') # Shapes: (513,) (513, 2)
idx0 = signal.find_peaks(P[:,0], threshold=0.01, distance=50)[0] # Peaks: [46.875, 2625., 13312.5, 16921.875] Hz

fig, axe = plt.subplots(2, 1, sharex=True, sharey=True)
axe[0].loglog(f, P[:,0])
axe[0].loglog(f[idx0], P[idx0,0], '.')
# [...]

At this point, the trickiest part is the fine tuning of find-peaks method to catch desired frequencies. You may need to consider to pre-filter your signal or post-process the PSD in order to make the identification easier.

I tried this code with my wav file, and got the error for the line `axe.semilogy(f, Pxx_den)` as follows : `ValueError: x and y must have same first dimension`. I checked the shapes and `f` has `(2,)` while `Pxx_den` has `(220160,2)`. Also, the Pxx_den array seems to have all zeros only. — Tejas Kumar, Feb 08 '19 at 13:43
I tried the same code with a sample of the 440Hz wave file (5s, fs=441.KHz,16bit) from the source link you provided, but got this error : `WavFileWarning: Chunk (non-data) not understood, skipping it. WavFileWarning)` — Tejas Kumar, Feb 08 '19 at 13:45
@TejasKumar, about the warning this might be a reason why: https://stackoverflow.com/questions/14321627/scipy-io-wavfile-gives-wavfilewarning-chunk-not-understood-error. For the first error it is difficult to see what is happening without having the original file. I will record a sample and update my post soon. — jlandercy, Feb 08 '19 at 13:49
@TejasKumar, I have updated my answer to solve the issue you underlined. It happens because of multiple channels. Now it works, let me know if I have sufficiently addressed your question. — jlandercy, Feb 18 '19 at 20:18