I have a solution that will reside on a user’s local mobile device, I want this to post audio content to Lex using the AWS REST API. The problem is that the solution can’t stream audio (up or down) and has almost no audio manipulation capabilities locally. However, Lex has very specific input requirements and also streams output.
So access will be via an API Gateway acting as a Proxy with a Lambda (Python 2.7) function to deal with the audio issues.
The output is all taken care of, the Lambda code saves the AudioStream into a file and sends that file as a response body, this works fine. However I can’t get the input to work.
The input audio is an MP3 file sent as the body of a POST request and I need to get this into a format acceptable to Lex.
I’ve investigated the following approaches
Native AWS
Use S3 and Elastic Transcoder - when transcoding to PCM the lowest allowed sample rate is 22050, but Lex requires 16000, this also doesn’t seem to allow transcoding to Opus format
Use MediaConvert - couldn’t see a setting to convert to PCM or Opus
Native Python
Python doesn’t seem to have the ability to unpack MP3 natively. I’ve read that this would be very slow and not worth doing.
Import a library
Use something ffmpeg-python or ffmpy - but this involves creating a deployment package or similar. I could go down this road but this really seems overly complicated for what I want to do.
Use something other than Python
I chose Python as I’m more familiar coding with it in Lambda but perhaps C#, Node, Java 8 have something available that would make this easy in a Lambda function.
At the moment I’m looking at doing the following
- Use Python to save the MP3 file to an S3 bucket
- Have Elastic Transcoder convert that MP3 to PCM at 22050 sample rate (but with all other settings set as Lex needs)
- Lambda read transcoded file back from S3
- Use the wave (import wav) library to read the file and then write the file with a sample rate of 16000 (this is the step I’m unsure about)
- Post the file (with correct sample rate) to Lex
Of course there will be some latency issues here, but as long as they’re not too severe I’m willing to live with them. This does seem overly complex for what I thought would be a fairly simple task. However, it's the best I’ve come up with so far, but even to prove it out will take a number of hours work and I’ve spent days on this already.
So the main question is whether Python Wave library can be used in AWS Lambda to modify the sample rate in this way?
If not, is there a way of solving this by either creating a deployment package, using an AWS feature I haven’t investigated yet or a neater way of doing this in something other than Python?
The problem is that the Lex part of this app was supposed to be a nice-to-have, it’s not a main feature and yet it’s taken up the majority of the dev time, I’m pretty close to just ditching it but thought I’d ask here first.