Pitch detection in Python

Question

The concept of the program I'm working on is a Python module which detects certain frequencies (human speech frequency 80-300hz) and by checking from a database shows the intonation of the sentence. I use SciPy to plot frequency of the sound files, but I cannot set any certain frequency in order to analyze pitch. How can I do this?

more info: I would like to be able to set a defined pattern in speech (e.g. Rising, Falling) and the program detects if the sound file follows the specific pattern.

What do you mean by you cannot set any frequency in order? Could you describe a bit more about what you are doing right now? — Sahil M, Sep 15 '15 at 20:56
I'd like to use Scipy (or maybe other libraries) to respond to specific frequencies (human speech) so that it can indicate pitch (see PRAAT software). By doing this I would like my program to be able to indicate intonation in a sentence (though it can't be that accurate, it is enough for linguistics researches ) @SahilM — Andrew Ravus, Sep 18 '15 at 09:37

Nikolay Shmyrev · Answer 1 · 2019-07-26T22:58:50.423

20

UPDATE in 2019, now there are very accurate pitch trackers based on neural networks. And they work in Python out-of-the-box. Check

https://pypi.org/project/crepe/

ANSWER FROM 2015. Pitch detection is a complex problem, a latest Google's package provides highly intelligent solution to this non-trivial task:

https://github.com/google/REAPER

You can wrap it in Python if you want to access it from Python.

edited Jul 26 '19 at 22:58

answered Nov 02 '15 at 01:03

Nikolay Shmyrev

24,897
5
43
87

Shimyrev How would you wrap this into python? – Dole Jun 11 '16 at 16:14
1

http://stackoverflow.com/questions/1942298/wrapping-a-c-library-in-python-c-cython-or-ctypes – Nikolay Shmyrev Jun 11 '16 at 18:52

Sahil M · Accepted Answer · 2015-09-18T09:43:12.287

You could try the following. I'm sure you know that human voice also has harmonics which go way beyond 300 Hz. Nevertheless, you can move a window across your audio file, and try to look at change in power in the max ( as shown below) or a set of frequencies in a window. The code below is for giving intuition:

import scipy.fftpack as sf
import numpy as np
def maxFrequency(X, F_sample, Low_cutoff=80, High_cutoff= 300):
        """ Searching presence of frequencies on a real signal using FFT
        Inputs
        =======
        X: 1-D numpy array, the real time domain audio signal (single channel time series)
        Low_cutoff: float, frequency components below this frequency will not pass the filter (physical frequency in unit of Hz)
        High_cutoff: float, frequency components above this frequency will not pass the filter (physical frequency in unit of Hz)
        F_sample: float, the sampling frequency of the signal (physical frequency in unit of Hz)
        """        

        M = X.size # let M be the length of the time series
        Spectrum = sf.rfft(X, n=M) 
        [Low_cutoff, High_cutoff, F_sample] = map(float, [Low_cutoff, High_cutoff, F_sample])

        #Convert cutoff frequencies into points on spectrum
        [Low_point, High_point] = map(lambda F: F/F_sample * M, [Low_cutoff, High_cutoff])

        maximumFrequency = np.where(Spectrum == np.max(Spectrum[Low_point : High_point])) # Calculating which frequency has max power.

        return maximumFrequency

voiceVector = []
for window in fullAudio: # Run a window of appropriate length across the audio file
    voiceVector.append (maxFrequency( window, samplingRate))

Now based on the intonation of the voice, the maximum power frequency may shift which you can register and map to a given intonation. This may not necessarily be true always, and you may have to monitor shifts in a lot of frequencies together, but this should get you started.

I have put in the modules to be imported, it uses scipy fftpack and numpy. — Sahil M, Sep 18 '15 at 09:44

score 6 · Answer 3 · answered Mar 26 '19 at 17:52

There are many different algorithms to estimate pitch, but a study found that Praat's algorithm is the most accurate [1]. Recently, the Parselmouth library has made it a lot easier to call Praat functions from Python [2].

[1]: Strömbergsson, Sofia. "Today's Most Frequently Used F0 Estimation Methods, and Their Accuracy in Estimating Male and Female Pitch in Clean Speech." INTERSPEECH. 2016. https://pdfs.semanticscholar.org/ff04/0316f44eab5c0497cec280bfb1fd0e7c0e85.pdf

[2]: https://github.com/YannickJadoul/Parselmouth

score 3 · Answer 4 · answered Jul 28 '18 at 20:18

There are basically two classes of f0 (pitch) estimation: time domain (with autocorrelation/cross-correlation, for example), and frequency domain (e.g. identifying the fundamental frequency by measuring distances between harmonics, or identifying the frequency in the spectrum with maximum power, as shown in the example above by Sahil M). For many years I have successfully used RAPT (Robust Algorithm for Pitch Tracking), the predecessor of REAPER, also by David Talkin. The widely used Praat software which you mention also includes a RAPT-like cross-correlation algorithm option. Description and code are readily available on the web. A DEB install archive is available here: http://www.phon.ox.ac.uk/releases Pattern detection (rises, falls, etc.) with the pitch function is a separate issue. The suggestion above by Sahil M for using a moving window across the pitch function is a good way to start.

Pitch detection in Python

4 Answers4