9

While I have many questions on this site dealing with the concept of pitch detection... They all deal with this magical FFT with which I am not familiar. I am trying to build an Android application that needs to implement pitch detection. I have absolutely no understanding for the algorithms that are used to do this.

It can't be that hard can it? There are around 8 billion guitar tuner apps on the android market after all.

Can someone help?

brainmurphy1
  • 1,082
  • 1
  • 19
  • 24
  • 2
    You need to have some basic understanding of the fast fourier transform to implement what you are asking. If you're looking for a java FFT library, I can't imagine it'd be difficult to find one. I'd recommend looking for a guitar tuner library instead unless you want to learn some signal processing. FFT would be tricky to implement without basic understanding. – Tucker Jul 19 '12 at 02:29

2 Answers2

16

The FFT is not really the best way to implement pitch detection or pitch tracking. One issue is that the loudest frequency is not always the fundamental frequency. Another is that the FFT, by itself, requires a pretty large amount of data and processing to obtain the resolution you need to tune an instrument, so it can appear slow to respond (i.e. latency). Yet another issue is that the result of an FFT is necessarily intuitive to work with: you get an array of complex numbers and you have to know how to interpret them.

If you really want to use an FFT, here is one approach:

  1. Low-pass your signal. This will help prevent noise and higher harmonics from creating spurious results. Conceivably, you could do skip this step and instead weight your results towards the lower values of the FFT instead. For some instruments with strong fundamental frequencies, this might not be necessary.
  2. Window your signal. Windows should be at lest 4096 in size. Larger is better to a point because it gives you better frequency resolution. If you go too large, it will end up increasing your computation time and latency. The hann function is a good choice for your window. http://en.wikipedia.org/wiki/Hann_function
  3. FFT the windowed signal as often as you can. Even overlapping windows are good.
  4. The results of the FFT are complex numbers. Find the magnitude of each complex number using sqrt( real^2 + imag^2 ). The index in the FFT array with the largest magnitude is the index with your peak frequency.
  5. You may want to average multiple FFTs for more consistent results.

How do you calculate the frequency from the index? Well, let's say you've got a window of size N. After you FFT, you will have N complex numbers. If your peak is the nth one, and your sample rate is 44100, then your peak frequency will be near (44100/2)*n/N. Why near? well you have an error of (44100/2)*1/N. For a bin size of 4096, this is about 5.3 Hz -- easily audible at A440. You can improve on that by 1. taking phase into account (I've only described how to take magnitude into account), 2. using larger windows (which will increase latency and processing requirements as the FFT is an N Log N algorithm), or 3. use a better algorithm like YIN http://www.ircam.fr/pcm/cheveign/pss/2002_JASA_YIN.pdf

You can skip the windowing step and just break the audio into discrete chunks of however many samples you want to analyze. This is equivalent to using a square window, which works, but you may get more noise in your results.

BTW: Many of those tuner apps license code form third parties, such as z-plane, and iZotope.

Update: If you want C source code and a full tutorial for the FFT method, I've written one. The code compiles and runs on Mac OS X, and should be convertible to other platforms pretty easily. It's not designed to be the best, but it is designed to be easy to understand.

Bjorn Roche
  • 11,279
  • 6
  • 36
  • 58
  • 1
    This appears to be exactly what I need, but it is useless to me because I do not know what "low-passing", "windowing", and the "Hann function" are. (Despite the link, I still do not understand how it applies.) The above suggestions may help someone who knows more, but I am asking this question because I have absolutely no knowledge of them. – brainmurphy1 Jul 20 '12 at 00:26
  • For low passing I suggested an alternative, but if you don't know what it is, you can google it and ask another question. It's not something that can be easily covered as a sub-answer to another question. – Bjorn Roche Jul 20 '12 at 01:54
  • 2
    Fact is, you've asked for information that is usually covered in a semester or two of a college level course, not every detail can be covered here. – Bjorn Roche Jul 20 '12 at 01:55
  • My partner implemented the same thing on an iPhone version from code that he found online. I am sure I could code my way through it if I understood the math, but because I would need to go through a semester course to do so, I need the code. My level of ignorance to this math is not just unfamiliarity... I have never dealt with anything on this level. Thank you for all of this... but what I really need is a method that I can pass an argument that will return a value. – brainmurphy1 Jul 20 '12 at 03:55
  • third result on google for "pitch detection java": http://tarsos.0110.be/artikels/lees/YIN_Pitch_Tracker_in_JAVA – Bjorn Roche Jul 20 '12 at 14:11
  • There is also a java implementation of an FFT-based guitar tuner (with all the caveats I mentioned) here: http://www.amazon.com/Digital-Audio-Java-Craig-Lindley/dp/0130876763 – Bjorn Roche Jul 20 '12 at 14:21
  • Of note, the java code from the book I linked does not appear to do any windowing, so apparently you can get away with skipping that step. Also, it uses a lower samplerate with an FFT-size of 4096, for an effective resolution of less than 3 Hz. – Bjorn Roche Jul 20 '12 at 15:44
5

A Fast Fourier Transform changes a function from time domain to frequency domain. So instead of f(t) where f is the signal that you are getting from the microphone and t is the time index of that signal, you get g(θ) where g is the FFT of f and θ is the frequency. Once you have g(θ), you just need to find which θ with the highest amplitude, meaning the "loudest" frequency. That will be the primary pitch of the sound that you are picking up.

As for actually implementing the FFT, if you google "fast fourier transform sample code", you'll get a bunch of examples.

Jon Lin
  • 142,182
  • 29
  • 220
  • 220
  • 2
    This [example](http://stackoverflow.com/a/2065693/230513) may be useful for testing. – trashgod Jul 19 '12 at 02:28
  • 1
    I found a bunch of samples that all need arrays of values that I have no idea how to obtain or what they mean. We are getting there though. Is [this](http://introcs.cs.princeton.edu/java/97data/FFT.java.html) in the right zip code ? – brainmurphy1 Jul 19 '12 at 02:44
  • 1
    @brainmurphy1 That link looks like the right thing. You get the array from reading input from the microphone. I've never done this before but google says you want the [AudioRecord](http://www.jarvana.com/jarvana/view/com/google/android/android/2.2.1/android-2.2.1-javadoc.jar!/android/media/AudioRecord.html) class, and here's an example: http://www.androiddevblog.net/android/android-audio-recording-part-2 – Jon Lin Jul 19 '12 at 02:52
  • 1
    @Jon Lin After I run the array from the AudioRecord class through this, I will get the "FFT" of the sound? how do I turn that into frequency? Am I making any sense? – brainmurphy1 Jul 19 '12 at 02:57
  • 2
    @brainmurphy1 Depends on the FFT function, but generally you go through the new array and look for the largest value, the index is the frequency. – Jon Lin Jul 19 '12 at 03:00