I'm doing some research on how to compare sound files(wave). Basically, I want to compare stored soundfiles (wav) with sound from a microphone. So in the end I would like to pre-store some voice commands of my own and then when I'm running my app I would like to compare the pre-stored files with input from the microphone.
My thought was to put in some margin when comparing because saying something two times in a row in the exactly same way would be difficult I guess.
So after some googling I see that Python has this module named wave and the Wave_read
object. That object has a function named readframes(n)
:
Reads and returns at most n frames of audio, as a string of bytes.
What do these bytes contain? I'm thinking of looping thru the wave files one frame at the time comparing them frame by frame.