Python 3.6 Ubuntu 18.04
Using the pyaudio module, I've successfully recorded the audio coming out of my speakers, and to test, I've been able to save it to a WAV file correctly. This is a part of a much larger multi-threaded application, so I'm happy to see the coordination working as planned. Now though, I want to perform some audio analysis on the data to gather the dominant frequencies present in the audio. So, I have a couple of questions, one more general because I'm curious, the other more specific to my problem:
1.) Here's snippets of the code I use to capture the audio frames:
self.audio_stream = self.audio_stream_parent.open(
format=AUDIO_FRAME_FORMAT,
channels=AUDIO_FRAME_CHANNELS,
rate=AUDIO_FRAME_RATE,
input=True,
frames_per_buffer=AUDIO_FRAME_SIZE_BYTES
)
...
while self.keep_audio_collection_thread_alive:
audio_frame = self.audio_stream.read(AUDIO_FRAME_SIZE_BYTES)
if self.collect_audio and audio_frame:
self.audio_collected.put(audio_frame)
My first question would be: what kind of data is represented in the audio_frame
variable? I get back a list of 4096 bytes for each read operation (even though AUDIO_FRAME_SIZE_BYTES
is set to 1024), and what does that actually describe? Is it all purely audio data, and things such as the number of channels and format need to be supplied later to reinterpret it? Or is information like that included in the 4096 bytes?
2.) What's the best way to perform frequency analysis on this data? I see a lot of information about the best way to perform a FFT on the contents of a WAV file, but I want to do this in real time, or close to it. I don't see a way to open a WAV file for reading and writing simultaneously, so I wouldn't be able to basically pass it through there. Can I perform this analysis on the raw data in audio_frame
? I suppose that's why I asked my first question, to see if I would be able to follow the general logic in this SO answer without having to actually write to a WAV file.
Thank you in advance!