I have just started to work on data in the form of audio. I am using librosa as a tool. My project requires me to extract features like:
- Total duration of the audio
- Minimum Intensity of the audio signal
- Maximum Intensity of the audio signal
- Mean Intensity of the audio signal
- Jitter
- Rate of speaking
- Number of Pauses
- Maximum Duration of Pauses
- Average Duration of Pauses
- Total Duration of Pauses
Although, I know about these terms but I have no idea how to extract these from an audio file. Are these inbuilt in some form in the librosa.feature
variable? Or we need to manually calculate these? Can someone guide me how to proceed?
I know that this job can be performed using softwares like Praat, but I need to do it in python.
Praat can be used for spectral analysis (spectrograms), pitch analysis, formant analysis, intensity analysis, jitter, shimmer, and voice breaks.