Voice Activity Detection using webrtcvad

Question

I am trying to write a code to get a binary output using webrtcvad module of a .wav format audio by dividing them into small chunks of 20ms. I am trying to get 1 as the output when audio is present and 0 when there is no audio in that small audio chunk. First I am reading the audio file using librosa module and then converting it into 16-bit PCM little-endian.

After this, I am segmenting the audio into small chunks of 20 ms and a with an overlapping window of 10ms. I am calculating the duration of music file using librosa.get_duration which is coming in float value and in seconds. And I am converting it into milliseconds but I am confused about how to set the exact length of the duration in ms without a float value to use in the for loop. So I am setting a round off value for the duration and I have set the sample rate to 16000Hz. And I have set the aggressive value of webrtcvad object to 2.

interval = 20, overlap = 10, start = 0, end = 0, f=0.

for i in range(0, 1520, interval): 

# During first iteration, 
# start is 0, end is the interval 
if i == 0: 
    start = 0
    end = interval 
else: 
    start = end - overlap 
    end = start + interval  
if end >= n: 
    end = n 
    f = 1
if f == 1:
    break

if f!=1:
    chunk = pcm_data[start:end] 
    print(vad.is_speech(chunk,16000))

I want to know how to do the segmentation in the correct way and how to use the webrtcvad module in the correct way. Please tell me what extra I have to add to get my binary output for a wave file format audio. And in the current code, I am getting the below error.

** File "C:\Users\Angel\anaconda3\lib\site-packages\webrtcvad.py", line 27, in is_speech return _webrtcvad.process(self._vad, sample_rate, buf, length)

Error: Error while processing frame**

Please provide a [Minimal Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) — Jon Nordby, Jun 02 '20 at 19:09
@jonnor sir, I have reduced my code ... Please check it, sir. — Aman Singh, Jun 13 '20 at 19:00
Hello, I got the same error here with a window of 25 ms. When I changed to 30 ms it worked. Don't know why. — Daniel Lima, Oct 30 '20 at 11:04
Take a look at this: https://github.com/wiseman/py-webrtcvad/issues/42 — Daniel Lima, Oct 30 '20 at 11:07
This answer has some details re format conversion that may be useful: https://stackoverflow.com/questions/61777371/voice-activity-detection — Jon Nordby, May 20 '23 at 09:35

Voice Activity Detection using webrtcvad

0 Answers0