1

I'm trying to figure out the average frequency or range of a person's voice as they speak into the microphone. It does not have to be real time. My approach so far was to use AVAudioEngine and AVAudioPCMBuffer, get the buffer data and convert it to FFT.

inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
            self.recognitionRequest?.append(buffer)

        let data =  buffer.floatChannelData?[0]
        let arrayOfData = Array(UnsafeBufferPointer(start: data, count: Int(buffer.frameLength)))
        let fftData = self.performFFT(arrayOfData)
}




func performFFT(_ input: [Float]) -> [Float] {

    var real = [Float](input)
    var imag = [Float](repeating: 0.0, count: input.count)
    var splitComplex = DSPSplitComplex(realp: &real, imagp: &imag)

    let length = vDSP_Length(floor(log2(Float(input.count))))
    let radix = FFTRadix(kFFTRadix2)
    let weights = vDSP_create_fftsetup(length, radix)
    vDSP_fft_zip(weights!, &splitComplex, 1, length, FFTDirection(FFT_FORWARD))


    var magnitudes = [Float](repeating: 0.0, count: input.count)
    vDSP_zvmags(&splitComplex, 1, &magnitudes, 1, vDSP_Length(input.count))

    var normalizedMagnitudes = [Float](repeating: 0.0, count: input.count)

    vDSP_vsmul(sqrt(magnitudes), 1, [2.0 / Float(input.count)], &normalizedMagnitudes, 1, vDSP_Length(input.count))

    vDSP_destroy_fftsetup(weights)    
    return normalizedMagnitudes
}


public func sqrt(_ x: [Float]) -> [Float] {
    var results = [Float](repeating: 0.0, count: x.count)
    vvsqrtf(&results, x, [Int32(x.count)])
    return results
}

I think I'm returning proper FFT Data, printing looks like this:

enter image description here

However this can't be the correct Hz. It was me speaking, and avg male voices have a range of 85 to 180 Hz. I'm just not sure where to go from here.

Goal is to find a frequency average or range for the when a user speaks through the mic. Thanks so much for any help!!!

robinyapockets
  • 363
  • 5
  • 21
  • Two major problems: (1) the quantity you want to measure is the *pitch* of the voice - this is (more or less) the fundamental frequency of a complex sound (complex in that it contains components at many different frequencies), and (2) the FFT does not directly give you frequency measurements - int eh code above you're actually generating an estimate of the *power spectrum* - if you plot this you should see a spectrum (magnitude versus frequency). – Paul R Mar 13 '17 at 14:00
  • See [this answer](http://stackoverflow.com/a/7675171/253056) for pseudo code for determining the frequency of the largest peak in the spectrum - note that this is not necessarily the pitch, or even the fundamental, but it's a starting point... – Paul R Mar 13 '17 at 14:03
  • Thanks so much @PaulR !! I'll spend some time on your linked answer. – robinyapockets Mar 13 '17 at 14:21

1 Answers1

1

The FFT magnitude is a spectral frequency estimator (which doesn't work for many voice pitches), not a pitch detection/estimation algorithm. Try a pitch estimation algorithm instead, which can better detect a fundamental pitch even if the vocal harmonic/overtone series has more spectral power.

hotpaw2
  • 70,107
  • 14
  • 90
  • 153