How to convert live real time audio from mic to text?

Question

I need to build a speech to text converter using Python and Google speech to text API. I want to do this real-time as in this example link. So far I have tried following code:

import speech_recognition as sr
import pyaudio

r= sr.Recognizer()
print("Running")

p = pyaudio.PyAudio()
for i in range(p.get_device_count()):
    print(p.get_device_info_by_index(i))

with sr.Microphone(1) as source:
    r.adjust_for_ambient_noise(source, 1)  # Adjust for ambient
    print("Say something!")
    audio=r.listen(source)
print("Runnnnnn")
try:
    print("Analyzing voice data  "+r.recognize_google(audio, language='hi-IN'))
except Exception:
    print("Something went wrong")

This code first listens through the microphone then it converts to the text format. What I want to achieve here is while listening it should start converting to text in real time instead of waiting for it to complete.

Possible duplicate of [Google Streaming Speech Recognition on an Audio Stream Python](https://stackoverflow.com/questions/44088246/google-streaming-speech-recognition-on-an-audio-stream-python) — Nikolay Shmyrev, Aug 24 '19 at 21:35

score 1 · Answer 1 · answered Aug 14 '20 at 09:24

You can use the below code to convert the real time audio from mic to real text.

import speech_recognition as sr
import pyaudio

init_rec = sr.Recognizer()
print("Let's speak!!")
with sr.Microphone() as source:
    audio_data = init_rec.record(source, duration=5)
    print("Recognizing your text.............")
    text = init_rec.recognize_google(audio_data)
    print(text)

score 0 · Answer 2 · answered Jul 06 '21 at 07:54

If you're looking for an environment you could clone and get started with the Speech API you can check the realtime-transcription-playground repository. It's a React<>Python implementation for real-time transcription.

It also includes the Python code that streams the audio data to the Speech API, should you only be interested in that https://github.com/saharmor/realtime-transcription-playground/blob/main/backend/google_speech_wrapper.py. Specifically, the following methods are relevant: start_listen, listen_print_loop, and generator.

How to convert live real time audio from mic to text?

2 Answers2