Why is the highest FFT peak not the fundamental frequency of a musical tone?

Question

Currently, I am trying to get the pitch for this twinkle twinkle little star file. For the most part the notes are correct in their frequencies which we get with the variable index_max. However, for the note of C5, it is returning C6. Frequency for C5 is around 523, while the frequency for C6 is around 1046. The FFT is telling us that the frequency is one octave above the expected result. This actually happens with many other files, and it seems that the lower the note, the more likely there is to be an issue. Any clarification on a better way to ask this question or an answer would be greatly appreciated!

import scipy.io.wavfile as wave
import numpy as np
from frequencyUtil import *
from scipy.fft import fft, ifft

def read_data(scale):
        infile = "twinkle.wav"
        rate, data = wave.read(infile)
        sample_rate = int(rate/scale)
        time_frames = [data[i:i + sample_rate] for i in range(0, len(data), sample_rate)]
        notes = []
        for x in range(len(time_frames)):                               # for each section, get the FFT
                if(type(data[0]) is np.int16):                               # If not dual channel process like normal
                        dataZero = np.array(time_frames[x])
                else:                                                   # if is dual channel get first ele of every list
                        data = np.array(time_frames[x])  # convert to np array
                        dataZero = [row[0] for row in data]
                frequencies = fft(dataZero)                          # get the FFT of the wav file

                inverse = ifft(np.real(frequencies))

                index_max = np.argmax(np.abs(frequencies[0:8800//scale]))      # get the index of the max number within music range
                #print(abs(frequencies[index_max]))
                # filters out the amplitudes that are lower than this value found through testing
                # should eventually understand the scale of the fft frequencies
                if(abs(frequencies[index_max]) < 4000000/scale):
                       continue
                index_max = index_max*scale
                print(index_max)
                notes.append(index_max)
        return notes```

As I understand, many instruments have more power in some harmonics than in the base tone. Finding the tone is not as trivial as finding the highest peak in the FFT. — Cris Luengo, Jun 27 '20 at 22:07
There are many questions here on SO that will help you. For example the first answer here is quite well informed: https://stackoverflow.com/questions/1457228/pitch-recognition-of-musical-notes-on-a-smart-phone — Cris Luengo, Jun 27 '20 at 22:10
Instead of using the raw FFT you need to make a little more effort as Chris already pointed out. An easy approach would be to use an existing implementation of the YIN or pYIN algorithm, e.g. https://github.com/xiaoch2004/librosa_py3_pYIN — Hendrik, Jun 28 '20 at 05:16

hotpaw2 · Answer 1 · 2020-07-03T17:43:26.440

Many pitched sounds (especially low ones) have overtones or harmonics in the spectrum that are stronger than the fundamental pitch. Those overtones are what makes a musical instrument or voice sound more interesting than a sine wave generator. But since pitch is psychoacoustic phenomena, human brains make the corrections needed to perceive what is considered the pitch.

Thus the strongest spectrum peak in an FFT magnitude vector is often not at the pitch frequency because the tone has a non-trivial spectrum.

There are tons of academic papers and articles on the problem of pitch detection and estimation. Many use Cepstral/cepstrum, autocorrelation, machine learning, and etc. methods.

Why is the highest FFT peak not the fundamental frequency of a musical tone?

1 Answers1