Manual pitch estimation of a speech signal

Question

I am new to speech processing. So please forgive for my ignorance. I was given a short speech signal (10 sec) and was asked to manually annotate pitch using MATLAB or Wavesufer software. Now how to find pitch of a speech signal?. Is there any theoretical resource to help the problem? I tried to plot pitch-contour of the signal using Wavesurfer.Is it right?

Edit 1:My work is applying various pitch detection algorithms for our data and compare their accuracies. So manually annotated pitch acts as the reference.

UPDATE 1: I obtained the GCIs (Glottal Closure Instants) by differentiating EGG (dEGG) signal and the peaks in dEGG are GCIs. Time interval between two successive GCIs is the pitch period (s). The inverse of pitch period is pitch (hz).

UPDATE 2 : SIGMA is a famous algorithm for automatic GCI detection.

Thanks everyone.

there are many ways to find the pitch, but the real question is what you mean by "manually annotate" - The pitch of speech exist between 50Hz and 500Hz - so the first thing you might want to do is to low pass filter your speech to get rid of some of the harmonics. If you simply want to manually determine the pitch, I suggest using a transformation such as a STFT (spectrogram) or cochleargram. — GameOfThrows, Sep 06 '16 at 09:46
@GameOfThrows My work is applying various pitch detection algorithms for our data and compare their accuracies. So manually annotated pitch acts as the reference. — gokul, Sep 06 '16 at 10:08
I cannot imagine manually annotated pitch can be accurate at all, but what you need is a spectrogram - look at matlab's spectrogram, the pitch is the bottom most significant contour between frequency range of 50Hz to 500Hz — GameOfThrows, Sep 06 '16 at 10:50
@GameOfThrows Thanks for helping. Out of curiosity , when people use pitch detection algorithms how do they calculate accuracies and compare performnaces . What is the reference there ? — gokul, Sep 06 '16 at 11:03
just like how a piano is tuned, they usually start with musical instrument of known pitch - then move on to speech files where the pitch has already been scientifically measured. Otherwise they compare against each other. The problem with speech is that the pitch can hardly be 1 frequency, unlike the solid metal/wood that makes a instrument, the human soft tissue usually create a range of pitch say 55-60Hz rather than just 55 Hz. — GameOfThrows, Sep 06 '16 at 11:08

score 1 · Accepted Answer · answered Sep 06 '16 at 11:55

1

Usually ground truth is obtained on the signal accompanied with EGG recording. EGG is an acronym for Electrogastrogram, it's a special device which records true pitch.

Since I doubt you have access to such device, I recommend you to use existing database for pitch extraction evaluation carefully prepared for that task. You can download it here. This data was collected in University of Edinburgh by Paul Bagshaw

I suggest you to read his thesis as well.

If you want to compare with the state of the art algorithm for pitch extraction check https://github.com/google/REAPER. Also note that "true" pitch might not be the best feature for subsequent algorithms. Sometime you might extract pitch with mistakes but get better accuracy for example for speech recognition. Check for more information this publication.

answered Sep 06 '16 at 11:55

Nikolay Shmyrev

24,897
5
43
87

Actually I have the EGG signal for corresponding signal with me. – gokul Sep 06 '16 at 12:42
1

Then you can simply extract pitch from EGG with REAPER, that would be the ground truth. You can check this methodology from http://tcts.fpms.ac.be/publications/papers/2013/icassp2013_obtdndatd.pdf section 3.2 Ground Truth – Nikolay Shmyrev Sep 06 '16 at 12:43
Thank you. It was really helpful. – gokul Sep 06 '16 at 16:38
Do you know the procedure to manually pitch mark EGG signal without REAPER using only MATLAB or wave surfer ? – gokul Sep 07 '16 at 05:42
It's not different from other waveform, load it in wavesurfer and extract pitch and then save the track. You can also use snack (getf0) with a batch of a files. Wavesurfer uses snack anyway. – Nikolay Shmyrev Sep 07 '16 at 08:11

Manual pitch estimation of a speech signal

1 Answers1