I am trying to write a code to get a binary output using webrtcvad module of a .wav format audio by dividing them into small chunks of 20ms. I am trying to get 1 as the output when audio is present and 0 when there is no audio in that small audio chunk. First I am reading the audio file using librosa module and then converting it into 16-bit PCM little-endian.
After this, I am segmenting the audio into small chunks of 20 ms and a with an overlapping window of 10ms. I am calculating the duration of music file using librosa.get_duration which is coming in float value and in seconds. And I am converting it into milliseconds but I am confused about how to set the exact length of the duration in ms without a float value to use in the for loop. So I am setting a round off value for the duration and I have set the sample rate to 16000Hz. And I have set the aggressive value of webrtcvad object to 2.
interval = 20, overlap = 10, start = 0, end = 0, f=0.
for i in range(0, 1520, interval):
# During first iteration,
# start is 0, end is the interval
if i == 0:
start = 0
end = interval
else:
start = end - overlap
end = start + interval
if end >= n:
end = n
f = 1
if f == 1:
break
if f!=1:
chunk = pcm_data[start:end]
print(vad.is_speech(chunk,16000))
I want to know how to do the segmentation in the correct way and how to use the webrtcvad module in the correct way. Please tell me what extra I have to add to get my binary output for a wave file format audio. And in the current code, I am getting the below error.
** File "C:\Users\Angel\anaconda3\lib\site-packages\webrtcvad.py", line 27, in is_speech return _webrtcvad.process(self._vad, sample_rate, buf, length)
Error: Error while processing frame**