3

I have simply pitch detection. Input (microphone) data are passed to fft routine, then I'm looking for a pitch with maximum value It means:

Max(pow(data[i].getRe(), 2) + pow(data[i].getIm(), 2)) for 0<= i < SAmplesSize

I need it for detection of guitar string's primary frequency. It works well for freq 440 hz (and maybe higher, i didn't check that) downto 250 hz. Below this value detected frequency is twice as high as it should be, ie. for 195 hz detected frequency is about 380 hz. It looks like it detects higher harmonics below 250 hz. For pure 195 hz tone it detects perfectly, but for quitar string something is wrong.

Any suggestion what can cause that ? Or should I use more sophisticated pitch detection?

Ps. sampling rate: 8000hz, input data size: 1024

cartoon_20
  • 303
  • 1
  • 4
  • 7

1 Answers1

1

I don't know about guitars specifically, but missing fundamentals seem to be quite common in acoustics. The Wikipedia page on pitch detection alludes to secondary processing steps after the FFT, perhaps one of these would be helpful.

Also, see these two SO questions, lots of good information there: (1), (2).

Community
  • 1
  • 1
mtrw
  • 34,200
  • 7
  • 63
  • 71
  • Can anyone tell me how to compute Rd from this link http://www.tedknowlton.com/resume/CCPPT.htm for: ftt size: 512 frequency of interest: 440Hz ? – cartoon_20 Dec 22 '10 at 21:27
  • and sampling frequency: 8000hz – cartoon_20 Dec 22 '10 at 22:28
  • @cartoon_20 - I scanned the paper and didn't see `Rd` - what page are you looking at? – mtrw Dec 26 '10 at 17:32
  • @mtrw - In the "Example calculation using Middle C measurements" table , below "Figure 5. FFT Bin Interpolation" diagram. Rd - it's about down sampling rate – cartoon_20 Dec 26 '10 at 21:17
  • It's the downsampling rate, or ratio. If FSample is N, and you downsample to N/K, the dowsampling ratio will be K. If your frequency of interest is 440 Hz, you need FSample > 880 Hz at a minimum, probably more like 1200 to be safe. If the original FSample is 8000, you need K <= 6. – mtrw Dec 26 '10 at 23:28
  • My FSample is permanent and equal 8000hz. But I still don't know how you computed K <= 6 for 440hz frequency of interest and permanent FSample=8000. How N/K ratio is related to the frequency of interest? And how this ratio will change if my frequency of interest changes to 330 Hz? – cartoon_20 Dec 27 '10 at 15:41
  • If your FSample is permanent, why are you worried about downsampling? – mtrw Dec 27 '10 at 15:51
  • FSample is sampling rate (frequency of taking samples), am I wrong? Ok, then if I switch to frequency domain (using fft) I get frequency bins (stripes) that are arranged every FSample/fftSize hz. Here fence effect (or similar name) appears. Algorithm that I linked is used to approximate frequency that I'm looking for. For example: bins are arranged every 15 hz, so n=20 bin is related to frequency 20*15= 300hz, and n+1 bin is related to frequency 21*15=315hz. Thanks to this approximation I can find frequency between bins, ie 310hz relying on magnitudes of n and n+1 bins. – cartoon_20 Dec 27 '10 at 16:21
  • Sure. So what's the question? – mtrw Dec 27 '10 at 16:32
  • My question is how to compute K for downsampling rate, with permanent sampling rate and changing frequency of interest? Can I use permanent K, ie K=5 for each frequency of interest? I don't know what K use for 440 hz frequency, and what for 330 hz. I read on the linked page that downsampling rate is related to the frequency of interest : "The lower the frequency of interest, then, the higher the Rd value". Is any rule for this? Something like this: You should take minimum K that fulfill equation FSample/K >= 2* FrequencyOfInterest – cartoon_20 Dec 27 '10 at 16:53
  • Or maybe any K I choose will be ok and it only affect precision of result? – cartoon_20 Dec 27 '10 at 17:07
  • I'm sorry, I don't understand your question at all. K is the downsampling rate. Downsampling means low-pass filtering the input, then discarding samples, thus changing the sampling rate. You stated earlier that your sampling rate is fixed, which I take to mean you are not downsampling. If you are not downsampling, `K`, by implication, is 1. – mtrw Dec 27 '10 at 17:09
  • ok let's try this way, step by step: fftSize: 512, sampling rate: 8000. Sound samples are captured and fft is done. Bin with maximum magnitude is 20th bin. Higher adjoining bin is 19th. Frequency of 20th bin is 20*8000/512=312hz (bin magnitude: 64), frequency of 19th bin is 19*8000/512= 297hz (bin magnitude: 46) – cartoon_20 Dec 27 '10 at 17:41
  • cd .. FFT size= 512; Fsamp= 8000; Rd= 5 (no idea how to pick this value so I took 5, don't know why); L= 46; H= 64; delta= H/(L+H)= 64/110= 0.582; B= 19; fbin= Fsamp/Rd/fftSize= 8000/5/512= 3.125; fpeak= (B + delta) x fbin= (19 + 0.582) x 3.125= 61.7hz; if frequency for 19-th peak is 297hz (19x8000/512) so frequency for peak between 19th and 20th bin can't eqal 61.7 hz. It should have value between 297 and 312. That is what I don't understand – cartoon_20 Dec 27 '10 at 17:48
  • YOU ARE NOT DOWNSAMPLING. See Fig. 3 in the paper. See where it says "Dowsample" in the big rectangular block? It happens before the FFT. Downsampling CHANGES THE SAMPLE RATE. Rd = 1 as long as you don't downsample. You don't get to "pick" it. See also the section in the paper called "Downsampling" – mtrw Dec 27 '10 at 17:57
  • ok let's leave this paper alone. I have last question. Can I approximate frequency between bins relying on their magnitudes like this: L= 46, H=64, delta= 0.582, so freq= 19.582 x (8000/512) = 306hz? I don't want do downsampling, I just want to approximate frequency. So far, if 20-th bin have the highest magnitude then I assumed than examined audio have 20 * 8000/512= 312hz frequency. I would like to make it more precisely without downsampling – cartoon_20 Dec 27 '10 at 18:16
  • I do not know of a way to do that. But the paper refers to a second paper, which has such a technique. Read the section titled "Spectral analysis and FFT bin interpolation" for the details. – mtrw Dec 27 '10 at 18:35