Looking for a way to detect pauses in audio files, and then be able to set punctuation between sentences?

Question

I have a transcript app that transcribes audio from a file to text. The problem is the output text is one long sentence. So I figured a solution could be, to look for pauses in the audio file and add punctuation's to the transcription.

If the audio content is this: How are you doing? --pause-- I am fine. --pause-- Ready to start? --pause--

It would transcribe to this: how are you doing. i am fine. ready to start.

My code looks like this:

import speech_recognition as sr

    r = sr.Recognizer()

    with sr.AudioFile('Interview_143.flac') as source:
        audio = r.listen(source)
        try:
            print("Google Speech Recognition results:")
            print(r.recognize_google(audio, show_all=True))  # (pretty)-print the recognition result

        except:
            print('No speech recognized...')

Result:

"a lot of text in one long sentence is hard to read as there is no punctuation between the sentences to fix this one would have to go through some sort of grammar service to fix it however they are not that good at setting punctuation anyway so a module/package could do the job just as good"

If not then maybe something like this: Detect silence in audio file

Pauses are not enough, sometimes punctuation does depend need pauses. You need punctuator, something like https://stackoverflow.com/questions/40961892/speech-recognition-in-real-time-with-punctuation — Nikolay Shmyrev, Oct 23 '19 at 09:55
Possible duplicate of [Speech recognition in real-time with punctuation](https://stackoverflow.com/questions/40961892/speech-recognition-in-real-time-with-punctuation) — Nikolay Shmyrev, Oct 23 '19 at 09:55
Also, speech_recognition is not a great way to transcribe long files, you'd better try different package. — Nikolay Shmyrev, Oct 23 '19 at 09:55
You could issue a system call with `subprocess` and use `ffmpeg` to detect silence (where you can define the threshold and duration). https://stackoverflow.com/questions/42507879/how-to-detect-the-silence-at-the-end-of-an-audio-file — havokles, Nov 05 '19 at 20:09

score 0 · Answer 1 · edited Nov 18 '21 at 22:39

0

To enable adding punctuations, please edit your configuration like this :

config = speech.RecognitionConfig( 
         language_code='en-US',
         sample_rate_hertz=44100,
         audio_channel_count=2,
         enable_word_time_offsets=True,
         model='video',
         enable_automatic_punctuation=True,
         )

edited Nov 18 '21 at 22:39

Reza Rahemtola

1,182
7
16
30

answered Nov 18 '21 at 08:11

Nilesh Gaikwad

1
3

Looking for a way to detect pauses in audio files, and then be able to set punctuation between sentences?

1 Answers1