14

I am trying to use Amazon's new streaming transcribe API from Go 1.11. Currently Amazon provides Java SDK only so I am trying the low-level way.

The only relevant piece of documentation is here but it does not show the endpoint. I have found it in a Java example that it is https://transcribestreaming.<region>.amazonaws.com and I am trying the Ireland region i.e. https://transcribestreaming.eu-west-1.amazonaws.com. Here is my code to open an HTTP/2 bi-directional stream:

import (
    "crypto/tls"
    "github.com/aws/aws-sdk-go-v2/aws"
    "github.com/aws/aws-sdk-go-v2/aws/external"
    "github.com/aws/aws-sdk-go-v2/aws/signer/v4"
    "golang.org/x/net/http2"
    "io"
    "io/ioutil"
    "log"
    "net/http"
    "os"
    "time"
)

const (
    HeaderKeyLanguageCode   = "x-amzn-transcribe-language-code"  // en-US
    HeaderKeyMediaEncoding  = "x-amzn-transcribe-media-encoding" // pcm only
    HeaderKeySampleRate     = "x-amzn-transcribe-sample-rate"    // 8000, 16000 ... 48000
    HeaderKeySessionId      = "x-amzn-transcribe-session-id"     // For retrying a session. Pattern: [a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}
    HeaderKeyVocabularyName = "x-amzn-transcribe-vocabulary-name"
    HeaderKeyRequestId = "x-amzn-request-id"
)

...

region := "eu-west-1"

cfg, err := external.LoadDefaultAWSConfig(aws.Config{
    Region: region,
})
if err != nil {
    log.Printf("could not load default AWS config: %v", err)
    return
}

signer := v4.NewSigner(cfg.Credentials)

transport := &http2.Transport{
    TLSClientConfig: &tls.Config{
        // allow insecure just for debugging
        InsecureSkipVerify: true,
    },
}
client := &http.Client{
    Transport: transport,
}

signTime := time.Now()

header := http.Header{}
header.Set(HeaderKeyLanguageCode, "en-US")
header.Set(HeaderKeyMediaEncoding, "pcm")
header.Set(HeaderKeySampleRate, "16000")
header.Set("Content-type", "application/json")

// Bi-directional streaming via a pipe.
pr, pw := io.Pipe()

req, err := http.NewRequest(http.MethodPost, "https://transcribestreaming.eu-west-1.amazonaws.com/stream-transcription", ioutil.NopCloser(pr))
if err != nil {
    log.Printf("err: %+v", err)
    return
}
req.Header = header

_, err = signer.Sign(req, nil, "transcribe", region, signTime)
if err != nil {
    log.Printf("problem signing headers: %+v", err)
    return
}

// This freezes and ends after 5 minutes with "unexpected EOF".
res, err := client.Do(req)
...

Problem is that executing the request (client.Do(req)) freezes for five minutes and then ends with the "unexpected EOF" error.

Any ideas what I am doing wrong? Did someone successfully use the new streaming transcribe API without the Java SDK?

EDIT (March 11, 2019):

I tested this again and now it does not time out but immediately returns 200 OK response. There is an "exception" in the response body though: {"Output":{"__type":"com.amazon.coral.service#SerializationException"},"Version":"1.0"}

I tried opening the HTTP2 stream with io.Pipe (like the code above) and also with a JSON body described in the documentation:

{
    "AudioStream": { 
        "AudioEvent": { 
            "AudioChunk": ""
        }
    }
}

The result was the same.

EDIT (March 13, 2019):

As mentioned by @gpeng, removing the content-type from headers will fix the SerializationException. But then there is an IAM exception and it is needed to add the transcription:StartStreamTranscription permission to your IAM user. That is though nowhere in the AWS IAM console and must be added manually as a custom JSON permission :/

There is also a new/another documentation document here which shows incorrect host and a new content-type (do not use that content-type, the request will return 404 with it).

After removing the content-type, and adding the new permission, now I am getting an exception {"Message":"A complete signal was sent without the preceding empty frame."}. Also writing to the pipe blocks forever, so I am stuck again. The messages described in the new documentation are different than in the old one, now finally binary, but I do not understand them. Any ideas how to send such HTTP2 messages in Go?

EDIT (Match 15, 2019):*

If you get HTTP 403 error about signature mismatch, then do not set the transfer-encoding and x-amz-content-sha256 HTTP headers. When I set them, sign the request with AWS SDK's V4 signer, then I receive HTTP 403 The request signature we calculated does not match the signature you provided.

shelll
  • 3,234
  • 3
  • 33
  • 67
  • Where did you find that url? https://docs.aws.amazon.com/general/latest/gr/rande.html#transcribe_region – ds011591 Jan 28 '19 at 15:18
  • @ds011591 it is in the linked Java example. The documentation link you provided is for the "non" streaming transcribe. – shelll Jan 28 '19 at 15:33
  • Were you ever able to get this working? – Stephen A. Lizcano Mar 11 '19 at 13:05
  • @stephenlizcano no, I was not able to solve this yet :( – shelll Mar 11 '19 at 13:36
  • @stephenlizcano I re-tested my code and now there is a different behaviour. I have updated my answer with more info. Still no solution though. – shelll Mar 11 '19 at 14:17
  • @shelll, I just have a feeling that JSON body you are sending may not exactly be a json, I am not sure though. Just take a look at this link https://docs.aws.amazon.com/transcribe/latest/dg/streaming-format.html and I am looking at the diagram that they had in there. Did you prepare the "Audioevent" from your audio chunk ? I found some ref for "Audioevent" in this link https://docs.aws.amazon.com/transcribe/latest/dg/API_streaming_AudioStream.html#transcribe-Type-streaming_AudioStream-AudioEvent – Manoj Mar 14 '19 at 20:37
  • Did you ever figure out what caused the `A complete signal was sent without the preceding empty frame` issue? – Magnus Nov 28 '19 at 06:55
  • @Magnus No I did not. I think though it is because I was sending the audio bytes in wrong message format (I sent the raw bytes directly up the HTTP/2 stream) thus AWS just terminated the connection with this message. – shelll Nov 28 '19 at 13:45

4 Answers4

4

I reached out to AWS support and they now recommend using websockets instead of HTTP/2 when possible (blog post here)

If this fits your usecase I would highly recommend checking out the new example repo at: https://github.com/aws-samples/amazon-transcribe-websocket-static which shows a browser-based solution in JS.

I've also noticed that the author of the demo has an express example on his personal Github at: https://github.com/brandonmwest/amazon-transcribe-websocket-express but I haven't confirmed if this is working.

Appreciate these examples aren't in Python but I think you'll have better luck using the Websocket client as opposed to HTTP/2 (which let's be honest, is still a bit terrifying :P)

Calvin
  • 101
  • 1
  • 5
1

Try not setting the content type header and see what response you get. I'm trying to do the same thing (but in Ruby) and that 'fixed' the SerializationException. Still can't get it to work but I've now got a new error to think about :)

UPDATE: I have got it working now. My issue was with the signature. If both host and authority headers are passed they are joined with , and treated as host on the server side when the signature is checked so the signatures never match. That doesn't seem like correct behaviour on the AWS side but it doesn't look like it's going to be an issue for you in Go.

gpeng
  • 144
  • 6
  • Now I do not get the `SerializationException`, but IAM issue and the AWS IAM console does not have `transcription:StartStreamTranscription` permission and I had to attach it as a raw JSON permission :D After that, I have a new exception `exception{"Message":"A complete signal was sent without the preceding empty frame."}`. There is also a new documentation page https://docs.aws.amazon.com/transcribe/latest/dg/streaming-format.html which shows new `content-type` but has a typo in the host :D This is a complete mess :( – shelll Mar 13 '19 at 14:51
  • The new documentation for streaming transcription is also bad. When using its `content-type` the HTTP request return 404 Not Found with `` in the body. And their HTTP2 messages for streaming the audio are horrible :( – shelll Mar 13 '19 at 15:28
  • Totally feel your pain. I've gone through basically the same set of errors. I'm at a dead end with it for now. There are encoders for the EventStream format you mentioned from the docs in the SDKs -> https://github.com/aws/aws-sdk-go/blob/master/private/protocol/eventstream/encode.go but I didn't get too far with the Ruby equivalent. There's a [Ruby SDK PR open](https://github.com/aws/aws-sdk-ruby/pull/1945) and an [event signing PR](https://github.com/aws/aws-sdk-ruby/pull/1946) that has just been merged so I think we'll hold out for that. – gpeng Mar 14 '19 at 12:43
  • That EventStream encoder looks promising, thanks! As this is no longer a high priority issue for me, I will probable wait for official support in the AWS Go SDK. I will probably revisit this in couple of months. – shelll Mar 14 '19 at 14:07
  • Guys, I'm doing the same thing in Python. After the success of step 1 and 2 i.e. getting the 'complete signal was sent' message. Did you guys figure out how to create audio event and audio event message? – Asym Mar 31 '19 at 13:59
  • @Asym could you please share how did you connect the stream (Step 1) using python? – Kshitij Saxena May 21 '19 at 10:21
  • https://pastebin.com/0SbR703K Here is what I have. I don't think it even accomplishes step 1. Do let me know if you manage to get it done. Thanks! – Asym May 21 '19 at 13:39
0

I'm still fighting this thing with Node.js as well. What is not clear about the docs is that in one place it says that the Content-Type should not be application/json, but in some other place, it makes it look like that payload should be encoded as application/vnd.amazon.eventstream. It looks like the payload should be carefully formatted in a binary format instead of a JSON object as follows:

Amazon Transcribe uses a format called event stream encoding for streaming transcription. This format encoded binary data with header information that describes the contents of each event. You can use this information for applications that call the Amazon Transcribe endpoint without using the Amazon Transcribe SDK. Amazon Transcribe uses the HTTP/2 protocol for streaming transcriptions. The key components for a streaming request are:

  • A header frame. This contains the HTTP headers for the request, and a signature in the authorization header that Amazon Transcribe uses as a seed signature to sign the following data frames.

  • One or message frames in event stream encoding. The frame contains metadata and the raw audio bytes.

  • An end frame. This is a signed message in event stream encoding with an empty body.

There is a sample function that shows how to implement all of that using Java which might shed some light in how this encoding is to be done.

Community
  • 1
  • 1
rodrigo-silveira
  • 12,607
  • 11
  • 69
  • 123
  • 2
    Their documentation is a mess, with errors and contradictions. I have given up currently and I will try it once they have a Go SDK. There are other providers of real-time speech recognition with punctuation and with more supported languages. The AWS streaming transcribe support US English only, so it is not worth to fight their API. – shelll Apr 09 '19 at 13:18
0

I had a similar requirement for using the AWS transcribe service with their WebSocket API in node js. Seeing as there was no support for this in the official package as of yet, I have gone ahead and written a package that is called AWS-transcribe and can be found here. I hope that helps.

It provides a stream interface around the WebSocket, and can be used like the below example

import { AwsTranscribe, StreamingClient } from "aws-transcribe"

const client = new AwsTranscribe({
    // if these aren't provided, they will be taken from the environment
    accessKeyId: "ACCESS KEY HERE",
    secretAccessKey: "SECRET KEY HERE",
})

const transcribeStream = client
    .createStreamingClient({
        region: "eu-west-1",
        sampleRate,
        languageCode: "en-US",
    })
    // enums for returning the event names which the stream will emit
    .on(StreamingClient.EVENTS.OPEN, () => console.log(`transcribe connection opened`))
    .on(StreamingClient.EVENTS.ERROR, console.error)
    .on(StreamingClient.EVENTS.CLOSE, () => console.log(`transcribe connection closed`))
    .on(StreamingClient.EVENTS.DATA, (data) => {
        const results = data.Transcript.Results

        if (!results || results.length === 0) {
            return
        }

        const result = results[0]
        const final = !result.IsPartial
        const prefix = final ? "recognized" : "recognizing"
        const text = result.Alternatives[0].Transcript
        console.log(`${prefix} text: ${text}`)
    })

someStream.pipe(transcribeStream)