I have a transcript app that transcribes audio from a file to text. The problem is the output text is one long sentence. So I figured a solution could be, to look for pauses in the audio file and add punctuation's to the transcription.
If the audio content is this: How are you doing? --pause-- I am fine. --pause-- Ready to start? --pause--
It would transcribe to this: how are you doing. i am fine. ready to start.
My code looks like this:
import speech_recognition as sr
r = sr.Recognizer()
with sr.AudioFile('Interview_143.flac') as source:
audio = r.listen(source)
try:
print("Google Speech Recognition results:")
print(r.recognize_google(audio, show_all=True)) # (pretty)-print the recognition result
except:
print('No speech recognized...')
Result:
"a lot of text in one long sentence is hard to read as there is no punctuation between the sentences to fix this one would have to go through some sort of grammar service to fix it however they are not that good at setting punctuation anyway so a module/package could do the job just as good"
If not then maybe something like this: Detect silence in audio file