AWS Transcribe Streaming BadRequestException: "Could not decode the audio stream..."

Question

I'm building a Transcribe Streaming app in Dart/Flutter with websockets. When I stream the test audio (pulled from a mono, 16kHz, 16bit signed little endian WAV file), I get...

BadRequestException: Could not decode the audio stream that you provided. Check that the audio stream is valid and try your request again.

As a test I'm using a file to stream the audio. I'm sending 32k data bytes every second (roughly simulating a realtime microphone stream). I even get the error if I stream all 0x00 or all 0xFF or random bytes. If I divide the chunk size to 16k and the interval time to 0.5s then it goes one more frame before erroring out...

As far as the data, I'm simply packing the bytes in the data portion of the EventStream frame literally as they are in the file. Clearly the Event Stream packaging is correct (the byte layout, the CRCs) or else I'd get an error indicating that, no?

What would indicate to AWSTrans that it is not decodable? Any other ideas on how to proceed with this?

thanks for any help...

Here's the code that does the packing. Full version is here (if you dare...It's a bit of a mess at the moment) https://pastebin.com/PKTj5xM2

Uint8List createEventStreamFrame(Uint8List audioChunk) {
  final headers = [
    EventStreamHeader(":content-type", 7, "application/octet-stream"),
    EventStreamHeader(":event-type", 7, "AudioEvent"),
    EventStreamHeader(":message-type", 7, "event")
  ];
  final headersData = encodeEventStreamHeaders(headers);
 
  final int totalLength = 16 + audioChunk.lengthInBytes + headersData.lengthInBytes;
  // final prelude = [headersData.length, totalLength];
  // print("Prelude: " + prelude.toString());
 
  // Convert a 32b int to 4 bytes
  List<int> int32ToBytes(int i) { return [(0xFF000000 & i) >> 24, (0x00FF0000 & i) >> 16, (0x0000FF00 & i) >> 8, (0x000000FF & i)]; }
 
  final audioBytes = ByteData.sublistView(audioChunk);
  var offset = 0;
  var audioDataList = <int>[];
  while (offset < audioBytes.lengthInBytes) {
    audioDataList.add(audioBytes.getInt16(offset, Endian.little));
    offset += 2;
  }
 
  final crc = CRC.crc32();
  final messageBldr = BytesBuilder();
  messageBldr.add(int32ToBytes(totalLength));
  messageBldr.add(int32ToBytes(headersData.length));
 
  // Now we can calc the CRC. We need to do it on the bytes, not the Ints
  final preludeCrc = crc.calculate(messageBldr.toBytes());
 
  // Continue adding data
  messageBldr.add(int32ToBytes(preludeCrc));
  messageBldr.add(headersData.toList());
  // messageBldr.add(audioChunk.toList());
  messageBldr.add(audioDataList);
  final messageCrc = crc.calculate(messageBldr.toBytes().toList());
  messageBldr.add(int32ToBytes(messageCrc));
  final frame = messageBldr.toBytes();
  //print("${frame.length} == $totalLength");
  return frame;
}

Someone else is having the same issue here: https://stackoverflow.com/questions/63137516/getting-badrequestexception-in-aws-transcribe-realtime — Hari Honor, Jun 21 '21 at 11:19

Hari Honor · Accepted Answer · 2021-06-30T14:50:53.660

BadRequestException, at least in my case, refered to having the frame encoded incorrectly rather than the audio data being wrong.

AWS Event Stream Encoding details are here.

I had some issues with endianness and bytesize. You need to be very bit-saavy with the message encoding and the audio buffer. The audio needs to be 16bit/signed (int)/little-endian (See here). And those length params in the message wrapper are 32bit (4 bytes) BIG endian. ByteData is your friend here in Dart. Here's a snippet from my updated code:

final messageBytes = ByteData(totalLength);

...

for (var i=0; i<audioChunk.length; i++) {
  messageBytes.setInt16(offset, audioChunk[i], Endian.little);
  offset += 2;
}

Notice that the 16bit int is actually taking up 2 bytes positions. If you don't specify the Endian style then it will default to your systems which will get it wrong either for the header int encoding or the audio data...lose lose!

The best way to go about ensuring it is all correct is to write your decode functions which you'll need for the AWS response anyway and then decode your encoded frame and see if it comes out the same. Use test data for the audo like [-32000, -100, 0, 200 31000] or something like that so you can test the endianness, etc. is all correct.

hmm, using the above I was getting empty transcripts but sending as Uint8List it works ... Is there something I missing here ? — Gyuri Majercsik, Sep 28 '21 at 12:17

score 0 · Answer 2 · answered Jun 21 '21 at 13:21

0

here is my suggestions (too long to be put into comments). Feel free to tell me updated information so that I can further think about it.

could you please use Wireshark to look at the data that is transmitted? (not necessary, see next paragraph for alternative) Please examine them, and see whether the data on the wire (i.e. data that is being transmitted) is valid. For example, manually record those data bytes and open it with some audio player.

or, instead of using wireshark, please write the data (that you originally transfer through websocket) onto a local file. open that local file, and see whether that is a valid audio. (p.s. notice that some audio players can tolerate malformed formats)

secondly, could you please try that, if you put all bytes of that originally good wav file in one packet of websocket, can it be played, or error still happen?

thirdly, this may not be a best practice... you know, wav is uncompressed and is quite huge. you may want something like AAC file format. Or, more advanced, the OPUS format. They both work well for streaming, for example, AAC has a sub-format called ADTS which packs into packets.

answered Jun 21 '21 at 13:21

ch271828n

15,854
5
53
88

Thanks. Yeah I was hoping for some info to save me going this route. AWS Trans supports PCM, FLAC, and OGG btw. It seems odd that even sending 0x0's gives the error. And incorrect encoding would really just be noise as there are no "bad characters" in PCM. So I'm confused at this level why AWS would error-out and close the connection so quickly. In the real world, perhaps the user would simply have been silent for a couple of seconds before beginning. – Hari Honor Jun 22 '21 at 09:13
@HariHonor if you give my suggestions a try I may be able to figure out more :) as for your question, is there an option in AWS (e.g. "aws-audio-running-mode: pcm") that you should set up? – ch271828n Jun 22 '21 at 13:03
Yeah, it turns out it had to do with issues with the header encoding involving endian'ness (header int32's are BigEndian whilst audio data is LittleEndian). Just trying to decide how to proceed. Appreciate your response and your nudge to get "down-and-dirty" :) But not sure it is a sufficient answer to the question to award the bounty – Hari Honor Jun 25 '21 at 13:18
@HariHonor happy to hear that :) IMHO what i have done is like "... a new piece of information that leads me to discover the solution myself (yourself)." – ch271828n Jun 25 '21 at 13:54

AWS Transcribe Streaming BadRequestException: "Could not decode the audio stream..."

2 Answers2

Linked