0

I am writing a piece of code that figures out what frequencies(notes) are being played at any given time of a song (note currently I am testing it grabbing only the first second of the song). To do this I break the first second of the audio file into 8 different chunks. Then I perform an FFT on each chunk and plot it with the following code:

% Taking a second of an audio file and breaking it into n many chunks and
% figuring out what frequencies make up each of those chunks
clear all;

% Read Audio
fs = 44100;         % sample frequency (Hz)
full = audioread('song.wav');

% Perform fft and get frequencies
chunks = 8;         % How many chunks to break wave into
for i = 1:chunks
    beginningChunk = (i-1)*fs/chunks+1
    endChunk = i*fs/chunks
    x = full(beginningChunk:endChunk);
    y = fft(x);
    n = length(x);     % number of samples in chunk
    amp = abs(y)/n;    % amplitude of the DFT
    %%%amp = amp(1:fs/2/chunks); % note this is my attempt that I think is wrong
    f = (0:n-1)*(fs/n);     % frequency range
    %%%f = f(1:fs/2/chunks); % note this is my attempt that I think is wrong

    figure(i);
    plot(f,amp)
    xlabel('Frequency')
    ylabel('amplitude')
end

When I do that I get graphs that look like these: enter image description here enter image description here

It looks like I am plotting too many points because the frequencies go up in magnitude at the far right of graphs so I think I am using the double sided spectrum. I think I need to only use the samples from 1:fs/2, the problem is I don't have a big enough matrix to grab that many points. I tried going from 1:fs/2/chunks, but I am unconvinced those are the right values so I commented those out. How can I find the single sided spectrum when there are less than fs/2 samples?

As a side note when I plot all the graphs I notice the frequencies given are almost exactly the same. This is surprising to me because I thought I made the chunks small enough that only the frequency that's happening at the exact time should be grabbed -- and therefore I would be getting the current note being played. If anyone knows how I can single out what note is being played at each time better that information would be greatly appreciated.

Tyler Hilbert
  • 2,107
  • 7
  • 35
  • 55

1 Answers1

1

For a single-sided FT simply take the first half of the output of the FFT algorithm. The other half (the nagative frequencies) is redundant given that your input is real-valued.

1/8 second is quite long. Note that relevant frequencies are around 160-1600 Hz, if I remeber correctly (music is not my specialty). Those will be in the left-most region of your FT. The highest frequency you compute (after dropping the right half of FFT) is half your sampling frequency, 44.1/2 kHz. The lowest frequency, and the distance between samples, is given by the length of your transform (44.1 kHz / number of samples).

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
  • Ok that fixed the problem. I changed the chunks to 3200 because that is double the highest frequency. I noticed that all the graphs follow a trend where the left side of the graph is high and the right side of the graph is low (like a graph of 1/x) like this: [https://imgur.com/a/u2K9M ](like this). I was curious if there was a reason for the graphs always looking like this. It makes it hard to pull out what frequencies are being played. – Tyler Hilbert Dec 31 '17 at 01:16
  • The first bin, for f=0, is typically the highest, it encodes the mean of the signal. In the figure you link you have frequency steps of >3000 Hz. That means you cannot read the value at relevant frequencies. You'll need to size your chunks such that your frequency step size is at most 16 Hz, if you want to distinguish the middle C from the next half-note up (https://en.m.wikipedia.org/wiki/Piano_key_frequencies). Note that you cannot interpolate in the frequency spectrum, you need to sample it densely enough to read each frequency of interest. – Cris Luengo Dec 31 '17 at 01:59
  • 16hZ will be too fast then to pick out individual notes at a time won't it? Will I be able to isolate individual notes just adjusting variables like the chunk size / Hz or will I have to use a different algorithm. I looked into onset detection and I think it may be useful, but I don't know if I'll need to use it or if there is a better algorithm for this. – Tyler Hilbert Jan 01 '18 at 16:55
  • I think that you need to compute the DFT at frequencies corresponding to the notes you want to recognize. If that means having a frequency interval if 16Hz, then that is how you need to sample the DFT. What I mean with 16Hz is that the spacing between DFT samples is 16Hz, your highest frequency will always be 22kHz. From there you compute how many samples you need in your chunks. Note the uncertainty principle: the shorter the chunks, the more precise in time the analysis, but the less precise in distinguishing frequencies. – Cris Luengo Jan 01 '18 at 17:37