How to handle in_data in Pyaudio callback mode?

Question

I'm doing a project on Signal Processing in python. So far I've had a little succes with the nonblocking mode, but it gave a considerable amount of delay and clipping to the output.

I want to implement a simple real-time audio filter using Pyaudio and Scipy.Signal, but in the callback function provided in the pyaudio example when I want to read the in_data I can't process it. Tried converting it in various ways but with no success.

Here's a code I want to achieve(read data from mic, filter, and output ASAP):

import pyaudio
import time
import numpy as np
import scipy.signal as signal
WIDTH = 2
CHANNELS = 2
RATE = 44100

p = pyaudio.PyAudio()
b,a=signal.iirdesign(0.03,0.07,5,40)
fulldata = np.array([])

def callback(in_data, frame_count, time_info, status):
    data=signal.lfilter(b,a,in_data)
    return (data, pyaudio.paContinue)

stream = p.open(format=pyaudio.paFloat32,
                channels=CHANNELS,
                rate=RATE,
                output=True,
                input=True,
                stream_callback=callback)

stream.start_stream()

while stream.is_active():
    time.sleep(5)
    stream.stop_stream()
stream.close()

p.terminate()

What is the right way to do this?

score 10 · Accepted Answer · answered Mar 12 '14 at 14:26

10

Found the answer to my question in the meantime, the callback looks like this:

def callback(in_data, frame_count, time_info, flag):
    global b,a,fulldata #global variables for filter coefficients and array
    audio_data = np.fromstring(in_data, dtype=np.float32)
    #do whatever with data, in my case I want to hear my data filtered in realtime
    audio_data = signal.filtfilt(b,a,audio_data,padlen=200).astype(np.float32).tostring()
    fulldata = np.append(fulldata,audio_data) #saves filtered data in an array
    return (audio_data, pyaudio.paContinue)

answered Mar 12 '14 at 14:26

function_store

373
1
2
12

Your script says `CHANNELS = 2`. How does this deal with the stereo input? – endolith Mar 25 '14 at 13:35
1

It reads the data interleaved meaning every second element from the zeroth will be (let's say) left (could be right), and every second element from the first will be the other channel. This version of my code doesn't deal with this really, I'm gonna provide a snippet in case you need it: def callback(in_data, frame_count, time_info, flag): global data, recording,ch1,ch2 data = np.fromstring(in_data, dtype=np.float32) ch1=data[0::2] ch2=data[1::2] return (in_data, recording) these arrays will be half as long, so if you want to play them back you need to double them. – function_store Mar 25 '14 at 19:41
So, you are processing the captured audio data within function definition callback, how were you coordinating with delay caused by filtering and data acquisition? – Laveena Feb 06 '18 at 15:33

score 1 · Answer 2 · edited May 04 '20 at 06:41

I had a similar issue trying to work with the PyAudio callback mode, but my requirements where:

Working with stereo output (2 channels).
Processing in real time.
Processing the input signal using an arbitrary impulse response, that could change in the middle of the process.

I succeeded after a few tries, and here are fragments of my code (based on the PyAudio example found here):

import pyaudio
import scipy.signal as ss
import numpy as np
import librosa   



track1_data, track1_rate = librosa.load('path/to/wav/track1', sr=44.1e3, dtype=np.float64)
track2_data, track2_rate = librosa.load('path/to/wav/track2', sr=44.1e3, dtype=np.float64)
track3_data, track3_rate = librosa.load('path/to/wav/track3', sr=44.1e3, dtype=np.float64)

# instantiate PyAudio (1)
p = pyaudio.PyAudio()
count = 0
IR_left = first_IR_left # Replace for actual IR
IR_right = first_IR_right # Replace for actual IR

# define callback (2)
def callback(in_data, frame_count, time_info, status):
    global count

    track1_frame = track1_data[frame_count*count : frame_count*(count+1)]
    track2_frame = track2_data[frame_count*count : frame_count*(count+1)]
    track3_frame = track3_data[frame_count*count : frame_count*(count+1)]

    track1_left = ss.fftconvolve(track1_frame, IR_left)
    track1_right = ss.fftconvolve(track1_frame, IR_right)
    track2_left = ss.fftconvolve(track2_frame, IR_left)
    track2_right = ss.fftconvolve(track2_frame, IR_right)
    track3_left = ss.fftconvolve(track3_frame, IR_left)
    track3_right = ss.fftconvolve(track3_frame, IR_right)

    track_left = 1/3 * track1_left + 1/3 * track2_left + 1/3 * track3_left
    track_right = 1/3 * track1_right + 1/3 * track2_right + 1/3 * track3_right

    ret_data = np.empty((track_left.size + track_right.size), dtype=track1_left.dtype)
    ret_data[1::2] = br_left
    ret_data[0::2] = br_right
    ret_data = ret_data.astype(np.float32).tostring()
    count += 1
    return (ret_data, pyaudio.paContinue)

# open stream using callback (3)
stream = p.open(format=pyaudio.paFloat32,
                channels=2,
                rate=int(track1_rate),
                output=True,
                stream_callback=callback,
                frames_per_buffer=2**16)

# start the stream (4)
stream.start_stream()

# wait for stream to finish (5)
while_count = 0
while stream.is_active():
    while_count += 1
    if while_count % 3 == 0:
        IR_left = first_IR_left # Replace for actual IR
        IR_right = first_IR_right # Replace for actual IR
    elif while_count % 3 == 1:
        IR_left = second_IR_left # Replace for actual IR
        IR_right = second_IR_right # Replace for actual IR
    elif while_count % 3 == 2:
        IR_left = third_IR_left # Replace for actual IR
        IR_right = third_IR_right # Replace for actual IR

    time.sleep(10)

# stop stream (6)
stream.stop_stream()
stream.close()

# close PyAudio (7)
p.terminate()

Here are some important reflections about the code above:

Working with librosa instead of wave allows me to use numpy arrays for processing which is much better than the chunks of data from wave.readframes.
The data type you set in p.open(format= must match the format of the ret_data bytes. And PyAudio works with float32 at most.
Even index bytes in ret_data go to the right headphone, and odd index bytes go to the left one.

Just to clarify, this code sends the mix of three tracks to the output audio in stereo, and every 10 seconds it changes the impulse response and thus the filter being applied. I used this for testing a 3d audio app I'm developing, and so the impulse responses where Head Related Impulse Responses (HRIRs), that changed the position of the sound every 10 seconds.

EDIT:
This code had a problem: the output had a noise of a frequency corresponding to the size of the frames (higher frequency when size of frames was smaller). I fixed that by manually doing an overlap and add of the frames. Basically, the ss.oaconvolve returned an array of size track_frame.size + IR.size - 1, so I separated that array into the first track_frame.size elements (which was then used for ret_data), and then the last IR.size - 1 elements I saved for later. Those saved elements would then be added to the first IR.size - 1 elements of the next frame. The first frame adds zeros.

Is it possible to have access to the full code? I would find it pretty useful — Mattia Surricchio, Dec 26 '20 at 18:46
Sure! [Here](https://github.com/grupo-1-ASSD-E2/ASSD-TP4/tree/master/processing_api_3D/audio_processing)'s the link to the GitHub repo where I used it. It's a bit unorganised since the project ended up going a different way, but in that folder you'll find a `convolutioner.py` file which does the processing, and a test file where I use `Convolutioner` to spatialise audio using HRIRs as impulse responses. — Facundo Farall, Dec 26 '20 at 21:27
Farrall This seems a really interesting work. Can I add you/write you somewhere? I'm working on my master thesis and I think that your code will be really useful to me (if I can use it, obviously with proper citation) — Mattia Surricchio, Dec 26 '20 at 22:05
Yeah no problem, you can reach me through [LinkedIn](https://www.linkedin.com/in/facundo-david-farall-5614311b4/). — Facundo Farall, Dec 26 '20 at 22:17
By the way I tried to run this code (removing all the unnecessary processing like fft ecc...) just to reproduce a simple input audio file, but it doesn't seem to work. The callback function is called only once and then the program stops. I don't get what's the problem — Mattia Surricchio, Dec 27 '20 at 14:18

How to handle in_data in Pyaudio callback mode?

2 Answers2

Linked