Python Audio Streaming & Speech/Text Recognition Project

Question

I have a pretty ambitious project in mind and wanted to lay out my thought process to see if this project is doable.

During a radio broadcast, the radio host usually gives out tickets if you're the #X caller during the time frame specified. Since most of us are at work and cannot always listen to the radio for these opportunities, I thought, "maybe I could write up a program that can do this". This is what I was thinking:

Listen to a radio stream URL (TuneIn Radio)
Analyze the incoming data by extracting keywords that could lead a listener to believe that a concert prize opportunity is coming up (ex: "Call in at 3:40PM for a chance to win tickets to see The Who!").
Make a Twilio call to the radio station phone number and call forward to your cellphone on a successful call.

I have started messing around with this and have come up with a few code snippets that I believe are in the right direction. I also have some concerns that I will mention after the code snippets.

So far I have come up with a process that uses the requests library to listen on a stream URL and write the content it receives to a .wav file. From there, the Google Speech Recognizer will analyze the audio file and print out the text.

import requests 

stream_url = "http://18073.live.streamtheworld.com:3690/WDHAFM_SC?DIST=TuneIn&TGT=TuneIn&maxServers=2&gender=m&ua=RadioTime&ttag=RadioTime"

r = requests.get(stream_url, stream=True)

f = open("audio.wav", "wb")

for block in r.iter_content(1024):
    f.write(block)

After running this script an a Ctrl-C the audio file is saved and is able to be listened to. Next, I have to use ffmpeg to convert the file to a true .wav. For some reason the f.write saves it with an mp3 codec. This is needed so Google Speech Recognition can properly load the file.

import speech_recognition as sr

audio_file = "audio.wav"

r = sr.Recognizer()
af = sr.AudioFile(audio_file)
with af as source:
    audio = r.record(source, duration=4)

text = r.recognize_google(audio)
print(text)

A few concerns:

Sometimes the text = r.recognize_google(audio) locks up. I'm not sure if this is due to the fact that sometimes music is just playing from the file.
Is there a way to filter out any non speech (i.e music without words)
Is it possible to transcribe the audio as it comes in in real time without writing to a file? This way I don't have to break it out into chunks, copy the file for reading and then analyze what is coming in.

Is there a better approach I can take to achieve this project?

Possible duplicate of [Google Streaming Speech Recognition on an Audio Stream Python](https://stackoverflow.com/questions/44088246/google-streaming-speech-recognition-on-an-audio-stream-python) — Nikolay Shmyrev, Oct 27 '18 at 19:00
You have to use Google speech api directly, not through the wrapper — Nikolay Shmyrev, Oct 27 '18 at 19:00

Python Audio Streaming & Speech/Text Recognition Project

0 Answers0