3

I have just started to work on data in the form of audio. I am using librosa as a tool. My project requires me to extract features like:

  • Total duration of the audio
  • Minimum Intensity of the audio signal
  • Maximum Intensity of the audio signal
  • Mean Intensity of the audio signal
  • Jitter
  • Rate of speaking
  • Number of Pauses
  • Maximum Duration of Pauses
  • Average Duration of Pauses
  • Total Duration of Pauses

Although, I know about these terms but I have no idea how to extract these from an audio file. Are these inbuilt in some form in the librosa.feature variable? Or we need to manually calculate these? Can someone guide me how to proceed?

I know that this job can be performed using softwares like Praat, but I need to do it in python.

Praat can be used for spectral analysis (spectrograms), pitch analysis, formant analysis, intensity analysis, jitter, shimmer, and voice breaks.

paradocslover
  • 2,932
  • 3
  • 18
  • 44
  • 1
    You can start reading the documentation https://librosa.github.io/librosa/ – Atirag Feb 11 '19 at 21:17
  • I have already gone through the doc and turned to SO only because the doc didn't help me. – paradocslover Feb 11 '19 at 21:19
  • `import scipy.io.wavfile` https://stackoverflow.com/a/24391521/1755108 – brokenfoot Feb 11 '19 at 21:24
  • @brokenfoot can you please explain how to use the obtained data? I got a bit of hint but it would be really helpful if you could throw some more light. – paradocslover Feb 11 '19 at 21:37
  • For eg to get duration, `scipy.io.wavfile.read(filename)` gives you 1. rate ie samples/sec, and 2. samples. You can calculate duration by `$2/$1`.. – brokenfoot Feb 11 '19 at 21:45
  • can you please explain the same for `number of pauses`? actually the function returns a 2d numpy array and i am not able to understand the contents of that array. is it intensity/ frequency/ something else? – paradocslover Feb 11 '19 at 21:59
  • 1
    It's 2-d array because it is dual channel. You can access each channel as `data[:,0]`, `data[:,1]`. Data is intensity, not frequency - you need FFT to get frequency. To get no of pauses, you need to define what a pause is. I don't think there is a function in scipy that look at the audio and detect pauses in it. – brokenfoot Feb 12 '19 at 00:23

0 Answers0