I have a function that detects the three most dominant frequencies in an incoming microphone stream. I'm running into a problem where when I play an "E4" note (392 Hz) on my piano, it says that the fundamental frequency is B5 (996 Hz). There are occasionally some other issues like it saying that a C4 is a C5, but this one is glaring. When I plot a graph of the frequencies, it looks like the E is clearly the most dominant, but for some reason it still says B5.
def pitch_calculations(stream, CHUNK, RATE):
# Read mic stream and then call struct.unpack to convert from binary data back to floats
data = stream.read(CHUNK, exception_on_overflow=False)
dataInt = np.array(struct.unpack(str(CHUNK) + 'h', data))
# Apply a window function (Hamming) to the input data
windowed_data = np.hamming(CHUNK) * dataInt
# Using numpy fast Fourier transform to convert mic data into frequencies
fft_result = np.abs(np.fft.fft(windowed_data)) * 2 / (11000 * CHUNK)
freqs = np.fft.fftfreq(len(windowed_data), d=1.0 / RATE)
# Find the indices of local maxima in the frequency spectrum
localmax_indecies = argrelextrema(fft_result, np.greater)[0]
# Get the magnitudes of the local maxima
strong_freqs = fft_result[localmax_indecies]
# Sort the magnitudes in descending order
sorted_indices = np.argsort(strong_freqs)[::-1]
# Get the indices of the three highest peaks
top_indices = sorted_indices[:6]
# Get the corresponding frequencies
note_1_freq = abs(freqs[localmax_indecies[top_indices[0]]])
note_2_freq = abs(freqs[localmax_indecies[top_indices[2]]])
note_3_freq = abs(freqs[localmax_indecies[top_indices[4]]])
return note_1_freq, note_2_freq, note_3_freq
Here is an image of my graph: