1

I am trying to write a code to get a binary output using webrtcvad module of a .wav format audio by dividing them into small chunks of 20ms. I am trying to get 1 as the output when audio is present and 0 when there is no audio in that small audio chunk. First I am reading the audio file using librosa module and then converting it into 16-bit PCM little-endian.

After this, I am segmenting the audio into small chunks of 20 ms and a with an overlapping window of 10ms. I am calculating the duration of music file using librosa.get_duration which is coming in float value and in seconds. And I am converting it into milliseconds but I am confused about how to set the exact length of the duration in ms without a float value to use in the for loop. So I am setting a round off value for the duration and I have set the sample rate to 16000Hz. And I have set the aggressive value of webrtcvad object to 2.

interval = 20, overlap = 10, start = 0, end = 0, f=0.

for i in range(0, 1520, interval): 

# During first iteration, 
# start is 0, end is the interval 
if i == 0: 
    start = 0
    end = interval 
else: 
    start = end - overlap 
    end = start + interval  
if end >= n: 
    end = n 
    f = 1
if f == 1:
    break

if f!=1:
    chunk = pcm_data[start:end] 
    print(vad.is_speech(chunk,16000))

I want to know how to do the segmentation in the correct way and how to use the webrtcvad module in the correct way. Please tell me what extra I have to add to get my binary output for a wave file format audio. And in the current code, I am getting the below error.

** File "C:\Users\Angel\anaconda3\lib\site-packages\webrtcvad.py", line 27, in is_speech return _webrtcvad.process(self._vad, sample_rate, buf, length)

Error: Error while processing frame**

Aman Singh
  • 23
  • 7

0 Answers0