13

All the examples I can find are mono, with CHANNELS = 1. How do you read stereo or multichannel input using the callback method in PyAudio and convert it into a 2D NumPy array or multiple 1D arrays?

For mono input, something like this works:

def callback(in_data, frame_count, time_info, status):
    global result
    global result_waiting

    if in_data:
        result = np.fromstring(in_data, dtype=np.float32)
        result_waiting = True
    else:
        print('no input')

    return None, pyaudio.paContinue

stream = p.open(format=pyaudio.paFloat32,
                channels=1,
                rate=fs,
                output=False,
                input=True,
                frames_per_buffer=fs,
                stream_callback=callback)

But does not work for stereo input, the result array is twice as long, so I assume the channels are interleaved or something, but I can't find documentation for this.

endolith
  • 25,479
  • 34
  • 128
  • 192

1 Answers1

18

It appears to be interleaved sample-by-sample, with left channel first. With signal on left channel input and silence on right channel, I get:

result = [0.2776, -0.0002,  0.2732, -0.0002,  0.2688, -0.0001,  0.2643, -0.0003,  0.2599, ...

So to separate it out into a stereo stream, reshape into a 2D array:

result = np.fromstring(in_data, dtype=np.float32)
result = np.reshape(result, (frames_per_buffer, 2))

Now to access the left channel, use result[:, 0], and for right channel, use result[:, 1].

def decode(in_data, channels):
    """
    Convert a byte stream into a 2D numpy array with 
    shape (chunk_size, channels)

    Samples are interleaved, so for a stereo stream with left channel 
    of [L0, L1, L2, ...] and right channel of [R0, R1, R2, ...], the output 
    is ordered as [L0, R0, L1, R1, ...]
    """
    # TODO: handle data type as parameter, convert between pyaudio/numpy types
    result = np.fromstring(in_data, dtype=np.float32)

    chunk_length = len(result) / channels
    assert chunk_length == int(chunk_length)

    result = np.reshape(result, (chunk_length, channels))
    return result


def encode(signal):
    """
    Convert a 2D numpy array into a byte stream for PyAudio

    Signal should be a numpy array with shape (chunk_size, channels)
    """
    interleaved = signal.flatten()

    # TODO: handle data type as parameter, convert between pyaudio/numpy types
    out_data = interleaved.astype(np.float32).tostring()
    return out_data
endolith
  • 25,479
  • 34
  • 128
  • 192
  • 1
    Very helpful. Partly related to [this question](http://stackoverflow.com/questions/22927096/how-to-print-values-of-a-string-full-of-chaos-question-marks/22927836?noredirect=1#comment35005843_22927836) – SolessChong Apr 08 '14 at 14:47
  • [For using other data formats for audio ancoding](https://stackoverflow.com/a/24985016/3002273) (eg. `np.int16`) –  Feb 26 '18 at 17:32
  • 2
    What does mean `interleaved`? I played with this stuff and `flatten` function actually was a solution, however `flatten` without parameter flattened two-dimentional array to one dimension but all values from first row were before all values from second row. In [`numpy` documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flatten.html) I found that you can provide `'F'` character as first parameter and it performs flattening the way we expect. Is it equivalent to your `interleaved.astype(np.float32).tostring()` call? If yes, it looks like the simplest solution. – pt12lol May 27 '18 at 19:00
  • 1
    @pt12lol As it says, "Samples are interleaved, so for a stereo stream with left channel of [L0, L1, L2, ...] and right channel of [R0, R1, R2, ...], the output is ordered as [L0, R0, L1, R1, ...]" – endolith May 28 '18 at 19:46
  • @endolith I have just tested Numpy's flatten method and @pt12lol is right that `'F'` is required to actually interleave a 2D array. Your `encode` method will put all left channel before the right channel, like [L0, L1, ..., R0, R1, ...] – Pinyi Wang May 21 '21 at 20:16