1

I've been working on a python app that displays in order each note that is played in an audio file. I do this as follows: I split the audio samples into onsets, I run fft on each onset and get the frequency that has the highest magnitude. When I run fft on one onset, I find a spike at the correct frequency, but I also find another one at the same note with one octave down or above. Sometimes, the magnitude is greater for the frequency of one octave above and I don't understand why that happens.

This is how I get the onsets:

    samples, sampling_rate = librosa.load(file, sr=None, mono=True, offset=0.0, duration=None)

    onset_frames = librosa.onset.onset_detect(samples, sr=sampling_rate, units='frames')

    onset_samples = librosa.frames_to_samples(onset_frames)

and this is how I compute the frequency of maximum amplitude:

def fft_compute(audio, sampling_rate):
    n = len(audio)
    T = 1 / sampling_rate
    yf = scipy.fft.fft(audio)
    xf = np.linspace(0.0, 1.0 / (2.0 * T), n // 2)
    yff = 2.0 / n * np.abs(yf[:n // 2])
    t = max(yff)

    return list(xf)[list(yff).index(t)]

Please let me know if there is a solution for this or if my approach should be changed. Thank you!

0 Answers0