1

I'm trying to prepare some audio files for my neural network. In order to remove the silent parts I used a solution a user posted hier on SO:

def detect_leading_silence(sound, silence_threshold=-50.0, chunk_size=10):
'''
sound is a pydub.AudioSegment
silence_threshold in dB
chunk_size in ms

iterate over chunks until you find the first one with sound
'''
trim_ms = 0 # ms

assert chunk_size > 0 # to avoid infinite loop
while sound[trim_ms:trim_ms+chunk_size].dBFS < silence_threshold and trim_ms < len(sound):
    trim_ms += chunk_size

return trim_ms

This works good, but not perfect. When I plot the audio signal you can see that there is still some noise/silence at the beginning of the audio. Additionally it's not really dynamic as the threshold is a fixed value, but for some other audio files I need a higher/lower threshold.

enter image description here enter image description here

How can I achieve that only the voice/speech is extracted?

Erik
  • 449
  • 5
  • 13

0 Answers0