Android: fundamental frequency

Question

I want to find the fundamental frequency for human voice in an Android Application. I'm calculating this one with this FFT class and this Complex class.

My code to calculate FFT is this:

public double calculateFFT(byte[] signal)
        {           
        final int mNumberOfFFTPoints =1024;
        double mMaxFFTSample;

        double temp;
        Complex[] y;
        Complex[] complexSignal = new Complex[mNumberOfFFTPoints];
        double[] absSignal = new double[mNumberOfFFTPoints/2];

        for(int i = 0; i < mNumberOfFFTPoints; i++){
            temp = (double)((signal[2*i] & 0xFF) | (signal[2*i+1] << 8)) / 32768.0F;
            complexSignal[i] = new Complex(temp,0.0);
        }

        y = FFT.fft(complexSignal); 

        mMaxFFTSample = 0.0;
        int mPeakPos = 0;
        for(int i = 0; i < (mNumberOfFFTPoints/2); i++)
        {
            absSignal[i] = Math.sqrt(Math.pow(y[i].re(), 2) + Math.pow(y[i].im(), 2));

            if(absSignal[i] > mMaxFFTSample)
            {
                mMaxFFTSample = absSignal[i];
                mPeakPos = i;
            } 
        }


        return ((1.0 * sampleRate) / (1.0 * mNumberOfFFTPoints)) * mPeakPos;

    }

and I have the same values as How do I obtain the frequencies of each value in an FFT?

Is it possible to find the fundamental frequency from these values? Can someone help me?

Thanks in advance.

You might want to try posting this on the [Signal Processing StackExchange](http://dsp.stackexchange.com/) — Cobbles, May 26 '14 at 10:23
If you want to detect voice *pitch* then read up on [cepstral analysis](https://en.wikipedia.org/wiki/Cepstrum) - you still need the FFT but there are a few more operations required to extract pitch. — Paul R, May 26 '14 at 11:25

Babson · Answer 1 · 2014-11-14T00:59:18.293

Fundamental frequency detection for human voice is an active area of research, as the references below suggest. Your approach must be carefully designed and must depend on the nature of the data.

For example if your source is a person singing a single note, with no music or other background sounds in the recording, a modified peak detector might give reasonable results.

If your source is generalized human speech, you will not get a unique fundamental frequency for anything other than the individual formants within the speech.

The graph below illustrates an easy detection problem. It shows the frequency spectrum of a female soprano holding a B-flat-3 (Bb3) note. The fundamental frequency of Bb3 is 233 Hz but the soprano is actually singing a 236 Hz fundamental (the left-most and highest peak.) A simple peak detector yields the correct fundamental frequency in this case.

Frequency spectrum of female soprano singing B-flat-3 note. Sooeet.com FFT calculator

The graph below illustrates one of the challenges of fundamental frequency detection, even for individually sung notes, let alone for generalized human speech. It shows the frequency spectrum of a female soprano holding an F4 note. The fundamental frequency of F4 is 349 Hz but the soprano is actually singing a 360 Hz fundamental (the left-most peak.)

Frequency spectrum of female soprano singing F4 note. Sooeet.com FFT calculator

However, in this case, the highest peak is not the fundamental, but rather the first harmonic at 714 Hz. Your modified peak detector would have to contend with these cases.

In generalized human speech, the concept of fundamental frequency is not really applicable to any subset of longer duration than each individual formant within the speech. This is because the frequency spectrum of generalized human speech is highly time-variant.

See these references:

Speech Signal Analysis

Human Speech Formants

Fundamental frequency detection

FFT, graphs, and audio data from Sooeet.com FFT calculator

score 2 · Accepted Answer · edited May 23 '17 at 12:27

Sounds like you've already chosen a solution (FFTs) to your problem. I'm no DSP expert, but I'd venture that you're not going to get very good results this way. See a much more detailed discussion here: How do you analyse the fundamental frequency of a PCM or WAV sample?

If you do choose to stick with this method:

Consider using more than 1024 points if you need accuracy at lower frequencies - remember a (spoken) human voice is surprisingly low.
Choose your sampling frequency wisely - apply a low-pass filter if you can. There's a reason that telephones have a bandwidth of only ~3KHz, the rest is not truly necessary for hearing human voices.
Then, examine the first half of your output values, and pick the lowest biggest one: this is where the hard part is - there may be several (Further peaks should appear at the harmonics (fixed multiples) of this too, but this is hard to check as your buckets are not of a useful size here). This is the range of frequencies that the true fundamental hopefully lies within.

Again though, maybe worth thinking of the other ways of solving this as FFT might give you disappointing results in the real world.

I try to find fundamental frequency with autocorrelation with edu.emory.mathcs.jtransforms.fft.DoubleFFT_1D; but I don't know if the value that return frequency=sampleRate*double)max_index/(double)mNumberOfFFTPoints;} is fundamental frequency? — user3582433, May 26 '14 at 11:48

score 0 · Answer 3 · answered May 26 '14 at 12:15

My code for autocorrelation in this:

    public double calculateFFT(double[] signal)

     {
      final int mNumberOfFFTPoints =1024;

      double[] magnitude = new double[mNumberOfFFTPoints/2];
      DoubleFFT_1D fft = new DoubleFFT_1D(mNumberOfFFTPoints);
      double[] fftData = new double[mNumberOfFFTPoints*2];
      double max_index=-1;
      double max_magnitude=-1;


      final float sampleRate=44100;
      double frequency;

      for (int i=0;i<mNumberOfFFTPoints;i++){

       //fftData[2 * i] = buffer[i+firstSample];
       fftData[2 * i] = signal[i];  //da controllare
       fftData[2 * i + 1] = 0;

       fft.complexForward(fftData);
      }

      for(int i = 0; i < mNumberOfFFTPoints/2; i++){

       magnitude[i]=Math.sqrt((fftData[2*i] * fftData[2*i]) + (fftData[2*i + 1] * fftData[2*i + 1]));



       if (max_magnitude<magnitude[i]){
        max_magnitude=magnitude[i];
        max_index=i;
       }
      }


      return frequency=sampleRate*(double)max_index/(double)mNumberOfFFTPoints;

 }

The value of "return" is my fundamental frequency?

why? I noted that fft.complexForward(fftData) don't work and I don't know because this happens. — user3582433, May 26 '14 at 14:37

score 0 · Answer 4 · answered May 26 '14 at 13:46

An FFT maxima returns the peak bin frequency, which may not be the fundamental frequency, but the FFT result bin nearest an overtone or harmonic of the fundamental frequency instead. A longer FFT using more data will give you more closely spaced FFT result bins, and thus a bin probably nearer the peak frequency. You might also be able to interpolate the peak if it is between bins. But if you are dealing with a signal that has a strong harmonic content, such as voice or music, the you may need to use a pitch detection/estimation algorithm instead of an FFT peak algorithm.

I try this: http://dsp.stackexchange.com/questions/8432/how-to-get-fundamental-frequency-of-a-signal-using-autocorrelation?rq=1 with autocorrelation, but it don't work very well, it don't calculate fft.ComplexForward(fftData) — user3582433, May 26 '14 at 14:35

Android: fundamental frequency

4 Answers4