Streaming audio to Amazon Lex in Python

Question

I am trying to use Amazon Lex as the conversation engine in a home assistant via the Python SDK. The post_content method seems appropriate and I did get it to work on text-only test examples. However, I am unable to figure out how to interact directly using audio streaming.

import pyaudio
import boto3

pa = pyaudio.PyAudio()

audio_stream = pa.open(
    rate=16000,
    channels=1,
    format=pyaudio.paInt16,
    input=True,
    frames_per_buffer=1024,
)

lex_client = boto3.client("lex-runtime")

response = lex_client.post_content(
    botName="BOT_NAME",
    botAlias="BOT_ALIAS",
    userId="USER_ID",
    contentType="audio/l16; rate=16000; channels=1",
    inputStream=audio_stream,
)

print(response)

This raises the following error:

botocore.exceptions.HTTPClientError: An HTTP Client raised an unhandled exception: 'Stream' object is not iterable

Fair enough, so I tried inputStream=audio_stream.read(1024), which works without a problem, but doesn't recognize any spoken text (i.e. 'inputTranscript': '' in the response). I imagine this is because the chunk is simply too short to contain meaningful text.

I am fairly inexperienced with web development so I suspect I am missing something very obvious. Looking at how audio streaming is apparently handled in Amazon Transcribe, it seems like I should be using async and callback functions.

How should I properly handle this stream? If there are fundamental things I should be understanding better, I'd also really appreciate pointers to the right resources.

rzlvmp · Answer 1 · 2022-07-15T14:06:44.243

It is simple.

Let's start from documentation:

inputStream (bytes or seekable file-like object) -- [REQUIRED]

User input in PCM or Opus audio format or text format as described in the Content-Type HTTP header.

You can stream audio data to Amazon Lex or you can create a local buffer that captures all of the audio data before sending. In general, you get better performance if you stream audio data rather than buffering the data locally.

Okay, what is file-like object in python? Looks like SOF knows answer on this question

This is the API for all file-like objects in Python (as of 3.10.5).

...
__iter__()
...

Okay, __iter__() means that file-like object should be iterable.

No problem let's check Stream class. Stream's methods are:

__init__
get_input_latency
get_output_latency
get_time
get_cpu_load
start_stream
stop_stream
is_active
is_stopped
write
read
get_read_available
get_write_available

Looks like no __iter__ here :( Is Stream a file-like object? Definitely not.

Why we checked Stream? Because pa.open returning Stream.

Okay, and what we have to do now?

Probably we have to start record → close record → write stream to file (or bytes (check for BytesIO)) and pass BytesIO object to AWS client. Because BytesIO is file-like object:

from io import BytesIO

# get methods of BytesIO
dir(BytesIO)

Output:

#  I can see __iter__ here!
['__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', 'close', 'closed', 'detach', 'fileno', 'flush', 'getbuffer', 'getvalue', 'isatty', 'read', 'read1', 'readable', 'readinto', 'readinto1', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']

Okay, and how to save Stream into BytesIO?

Looks like this answer is closer to our aim.

--

You can protest: But here is saying "You can stream audio data to Amazon Lex"

That is true. But Pyaudio's Stream is just a class name. And it is not satisfying python's stream standards

So do I understand correctly, that I can only send data in chunks? I mean ideally I would continually send data to Lex until it is convinced that the user is done talking. In the current setup, I could only send a couple seconds at a time and if the user finishes later, then he gets cut off and if he finishes sooner he has to wait for the chunking to finish. — AlexM, Jul 21 '22 at 15:25

Streaming audio to Amazon Lex in Python

1 Answers1