I have a pretty ambitious project in mind and wanted to lay out my thought process to see if this project is doable.
During a radio broadcast, the radio host usually gives out tickets if you're the #X caller during the time frame specified. Since most of us are at work and cannot always listen to the radio for these opportunities, I thought, "maybe I could write up a program that can do this". This is what I was thinking:
- Listen to a radio stream URL (TuneIn Radio)
- Analyze the incoming data by extracting keywords that could lead a listener to believe that a concert prize opportunity is coming up (ex: "Call in at 3:40PM for a chance to win tickets to see The Who!").
- Make a Twilio call to the radio station phone number and call forward to your cellphone on a successful call.
I have started messing around with this and have come up with a few code snippets that I believe are in the right direction. I also have some concerns that I will mention after the code snippets.
So far I have come up with a process that uses the requests
library to listen on a stream URL and write the content it receives to a .wav
file. From there, the Google Speech Recognizer
will analyze the audio file and print out the text.
import requests
stream_url = "http://18073.live.streamtheworld.com:3690/WDHAFM_SC?DIST=TuneIn&TGT=TuneIn&maxServers=2&gender=m&ua=RadioTime&ttag=RadioTime"
r = requests.get(stream_url, stream=True)
f = open("audio.wav", "wb")
for block in r.iter_content(1024):
f.write(block)
After running this script an a Ctrl-C
the audio file is saved and is able to be listened to. Next, I have to use ffmpeg
to convert the file to a true .wav
. For some reason the f.write
saves it with an mp3
codec. This is needed so Google Speech Recognition
can properly load the file.
import speech_recognition as sr
audio_file = "audio.wav"
r = sr.Recognizer()
af = sr.AudioFile(audio_file)
with af as source:
audio = r.record(source, duration=4)
text = r.recognize_google(audio)
print(text)
A few concerns:
- Sometimes the
text = r.recognize_google(audio)
locks up. I'm not sure if this is due to the fact that sometimes music is just playing from the file. - Is there a way to filter out any non speech (i.e music without words)
- Is it possible to transcribe the audio as it comes in in real time without writing to a file? This way I don't have to break it out into chunks, copy the file for reading and then analyze what is coming in.
Is there a better approach I can take to achieve this project?