17

I am trying to use Amazon Transcribe Streaming Service with a http2 request from Node.js, Here is the documentation links that I am following Streaming request format. According to this document end point is https://transcribe-streaming.<'region'>.amazonaws.com, but making a request to this url gives url not found error. But in the Java Example found end point as https://transcribestreaming.''.amazonaws.com, so making a request to this url does not give any error or response back. I am trying from us-east-1 region.

Here is the code I am trying with.

const http2 = require('http2');
var aws4  = require('aws4');

var opts = {
  service: 'transcribe', 
  region: 'us-east-1', 
  path: '/stream-transcription', 
  headers:{
   'content-type': 'application/json',
   'x-amz-target': 'com.amazonaws.transcribe.Transcribe.StartStreamTranscription'
  }
}

var urlObj = aws4.sign(opts, {accessKeyId: '<access key>', secretAccessKey: '<aws secret>'});
const client = http2.connect('https://transcribestreaming.<region>.amazonaws.com');
client.on('error', function(err){
  console.error("error in request ",err);
});

const req = client.request({
  ':method': 'POST',
  ':path': '/stream-transcription',
  'authorization': urlObj.headers.Authorization,  
  'content-type': 'application/json',
  'x-amz-content-sha256': 'STREAMING-AWS4-HMAC-SHA256-EVENTS',
  'x-amz-target': 'com.amazonaws.transcribe.Transcribe.StartStreamTranscription',
  'x-amz-date': urlObj['headers']['X-Amz-Date'],
  'x-amz-transcribe-language-code': 'en-US',
  'x-amz-transcribe-media-encoding': 'pcm',
  'x-amz-transcribe-sample-rate': 44100
});

req.on('response', (headers, flags) => {
  for (const name in headers) {
    console.log(`${name}: ${headers[name]}`);
  }
});
let data = '';
req.on('data', (chunk) => { data += chunk; });
req.on('end', () => {
  console.log(`\n${data}`);
  client.close();
});
req.end();

Can anyone point out what I am missing here. I couldn't find any examples implementing this with HTTP/2 either.

Update: Changing the Content-type to application/json came back with response status 200 but with following exception:

`{"Output":{"__type":"com.amazon.coral.service#SerializationException"},"Version":"1.0"}`

Update (April-22-2019):

req.setEncoding('utf8');
req.write(audioBlob);

var audioBlob = new Buffer(JSON.stringify({
    "AudioStream": { 
       "AudioEvent": { 
          "AudioChunk": audioBufferData
     }
 }

Before ending the request I am adding a "audioblod" as payload by serializing. My "audioBufferData" is in raw PCM audio format from browser. I see from documentation payload has to be encoded to "Event Stream Encoding", but couldn't figure out how to implement it.

So with out this event stream encoding currently i am getting this following exception with 200 response status.

{"Output":{"__type":"com.amazon.coral.service#UnknownOperationException"},"Version":"1.0"}
Manoj
  • 983
  • 12
  • 23
  • 1
    Were you ever able to get this to work? I am in a similar situation. Thanks! – Stephen A. Lizcano Mar 11 '19 at 12:59
  • 1
    Nope, still stuck on the same issue. – Manoj Mar 11 '19 at 13:58
  • There is also older documentation for the streaming transcription, which has the correct host but bad `content-type` :D https://docs.aws.amazon.com/transcribe/latest/dg/API_streaming_StartStreamTranscription.html I am fighting this same API in Go and I was able to get past the initial connection and IAM authentication here https://stackoverflow.com/questions/53743785/amazon-transcribe-streaming-api-without-sdk Not fully working yet though. – shelll Mar 13 '19 at 14:55
  • Removing/not setting the `content-type` should help a bit. In my case setting the correct `content-type` return HTTP 404. I am stuck after that though. – shelll Mar 13 '19 at 15:40
  • Yup, setting content-type:application/json returned the response with status 200 but with exception {"Output":{"__type":"com.amazon.coral.service#SerializationException"},"Version":"1.0"}; but if content-type is not provided it gives a 403. – Manoj Mar 13 '19 at 21:33
  • HTTP 403 is forbidden which means that your API key (i.e. the IAM user used for the API call) does not have the `transcription:StartStreamTranscription` permission set in the AWS IAM console. Check my other SO question where I give more info about this. – shelll Mar 14 '19 at 14:10
  • I don't think it is the IAM user issue because i just used my root access tokens. and also I tried running the java example with my creds it worked. – Manoj Mar 14 '19 at 20:30
  • Another case for 403 is when I set the header `x-amz-content-sha256` or `transfer-encoding`, then the V4 signature is incorrect. Even though I am signing the request with AWS SDK. Do not set them and you should get past the 403. – shelll Mar 15 '19 at 12:25
  • `headers: { 'host': 'transcribestreaming.us-west-2.amazonaws.com:443'}` - I added this to my `opts` object, and now i am getting another error. Also changed `content-type: application/vnd.amazon.eventstream` header – Denis Vabishchevich Apr 22 '19 at 10:25
  • Currently I am having same issue as presented in this amazon transcribe GitHub issue page https://github.com/awsdocs/amazon-transcribe-developer-guide/issues/6 It be helpful if amazon responds to it. – Manoj Apr 22 '19 at 12:43
  • What is `client`? its currently undefined in your example – Tony May 23 '19 at 17:46
  • Just updated code. const client = http2.connect('https://transcribestreaming..amazonaws.com'); – Manoj May 24 '19 at 00:55
  • @Manoj I really like AWS, but the speech to text Azure is clear to me. https://learn.microsoft.com/en-in/azure/cognitive-services/speech-service/rest-speech-to-text – Deividson Damasio Jun 21 '19 at 21:26
  • Thanks for the suggestion @DeividsonDamasio I had to use AWS, (business requirements). so currently sticking to AWS. – Manoj Jul 31 '19 at 14:03

4 Answers4

6

I reached out to AWS support and they don't seem to be able to get a HTTP/2 implementation working with NodeJS Either.

However, they have now provided a way to interact with the Transcribe streaming API directly via websockets (blog post here)

If this fits your usecase I would highly recommend checking out the new example repo at: https://github.com/aws-samples/amazon-transcribe-websocket-static

If you're using it in a public-facing page I'd recommend using unauthenticated Cognito sessions to handle credential retrieval. I've got this working in a production app so feel free to tag me in any other questions.

Calvin
  • 101
  • 1
  • 5
  • 1
    Thank you @calvin for the effort. I tried running this code but it is giving the following exception "message-typeappexception{"Message":"Could not process the audio stream that you provided. Try your request again."}o?=o?=(". I am using the same file which is working fine when I use the Java example that i mentioned in the question. – Manoj Jul 02 '19 at 13:47
  • Hey @Manoj - my answer here was premature. However, after quite a while of back-and-forth with AWS we've finally had some movement. See edit – Calvin Jul 31 '19 at 09:42
  • Thanks @calvin - My use case is a bit different, we have a home grown diarizer service to detect speaker change etc... which currently AWS doesn't offer. So if I use websockets browser would require double the bandwidth to send data to two different services. So for now as a workaround I edited the java example they provided and using it in the backend to send requests. Though still facing some issues with it after 20 or 30 min into streaming getting signature expired exceptions while session is still active from AWS service. – Manoj Jul 31 '19 at 13:59
2

This doesn't directly answer the question but I think it's useful enough to post as an answer rather than a comment.

AWS just announced WebSocket support for Amazon Transcribe. Here are the docs, and here is a client-side sample app. The biggest difference, and one that I think makes integrating with WebSockets more straightforward, is that you do not need to sign each audio chunk like http/2 requires.

The relevant code for authorizing and initiating the connection using a pre-signed URL is in lib/aws-signature-v4.js

exports.createPresignedURL = function(method, host, path, service, payload, options) {
  options = options || {};
  options.key = options.key || process.env.AWS_ACCESS_KEY_ID;
  options.secret = options.secret || process.env.AWS_SECRET_ACCESS_KEY;
  options.protocol = options.protocol || 'https';
  options.headers = options.headers || {};
  options.timestamp = options.timestamp || Date.now();
  options.region = options.region || process.env.AWS_REGION || 'us-east-1';
  options.expires = options.expires || 86400; // 24 hours
  options.headers = options.headers || {};

  // host is required
  options.headers.Host = host;

  var query = options.query ? querystring.parse(options.query) : {};
  query['X-Amz-Algorithm'] = 'AWS4-HMAC-SHA256';
  query['X-Amz-Credential'] = options.key + '/' + exports.createCredentialScope(options.timestamp, options.region, service);
  query['X-Amz-Date'] = toTime(options.timestamp);
  query['X-Amz-Expires'] = options.expires;
  query['X-Amz-SignedHeaders'] = exports.createSignedHeaders(options.headers);

  var canonicalRequest = exports.createCanonicalRequest(method, path, query, options.headers, payload);
  var stringToSign = exports.createStringToSign(options.timestamp, options.region, service, canonicalRequest);
  var signature = exports.createSignature(options.secret, options.timestamp, options.region, service, stringToSign);
  query['X-Amz-Signature'] = signature;
  return options.protocol + '://' + host + path + '?' + querystring.stringify(query);
};

And we invoke it in lib/main.js:

function createPresignedUrl() {
    let endpoint = "transcribestreaming." + region + ".amazonaws.com:8443";

    // get a preauthenticated URL that we can use to establish our WebSocket
    return v4.createPresignedURL(
        'GET',
        endpoint,
        '/stream-transcription-websocket',
        'transcribe',
        crypto.createHash('sha256').update('', 'utf8').digest('hex'), {
            'key': $('#access_id').val(),
            'secret': $('#secret_key').val(),
            'protocol': 'wss',
            'expires': 15,
            'region': region,
            'query': "language-code=" + languageCode + "&media-encoding=pcm&sample-rate=" + sampleRate
        }
    );
}

To package things in the event stream message format we need, we wrap the PCM-encoded audio in a JSON envelope and convert it to binary

function convertAudioToBinaryMessage(audioChunk) {
    let raw = mic.toRaw(audioChunk);

    if (raw == null)
        return;

    // downsample and convert the raw audio bytes to PCM
    let downsampledBuffer = audioUtils.downsampleBuffer(raw, sampleRate);
    let pcmEncodedBuffer = audioUtils.pcmEncode(downsampledBuffer);

    // add the right JSON headers and structure to the message
    let audioEventMessage = getAudioEventMessage(Buffer.from(pcmEncodedBuffer));

    //convert the JSON object + headers into a binary event stream message
    let binary = eventStreamMarshaller.marshall(audioEventMessage);

    return binary;
}

function getAudioEventMessage(buffer) {
    // wrap the audio data in a JSON envelope
    return {
        headers: {
            ':message-type': {
                type: 'string',
                value: 'event'
            },
            ':event-type': {
                type: 'string',
                value: 'AudioEvent'
            }
        },
        body: buffer
    };
}
bwest
  • 9,182
  • 3
  • 28
  • 58
0

I had a similar requirement for using the AWS transcribe service with their WebSocket API in node js. Seeing as there was no support for this in the official package as of yet, I have gone ahead and written a package following this implementation available on github. It's called AWS-transcribe and can be found here. I hope that helps.

It provides a stream interface around the WebSocket, and can be used like the below example

import { AwsTranscribe, StreamingClient } from "aws-transcribe"

const client = new AwsTranscribe({
    // if these aren't provided, they will be taken from the environment
    accessKeyId: "ACCESS KEY HERE",
    secretAccessKey: "SECRET KEY HERE",
})

const transcribeStream = client
    .createStreamingClient({
        region: "eu-west-1",
        sampleRate,
        languageCode: "en-US",
    })
    // enums for returning the event names which the stream will emit
    .on(StreamingClient.EVENTS.OPEN, () => console.log(`transcribe connection opened`))
    .on(StreamingClient.EVENTS.ERROR, console.error)
    .on(StreamingClient.EVENTS.CLOSE, () => console.log(`transcribe connection closed`))
    .on(StreamingClient.EVENTS.DATA, (data) => {
        const results = data.Transcript.Results

        if (!results || results.length === 0) {
            return
        }

        const result = results[0]
        const final = !result.IsPartial
        const prefix = final ? "recognized" : "recognizing"
        const text = result.Alternatives[0].Transcript
        console.log(`${prefix} text: ${text}`)
    })

someStream.pipe(transcribeStream)
0

The AWS SDK V3 now supports streaming for transcribe: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-transcribe-streaming/index.html

Otto
  • 1,787
  • 1
  • 17
  • 25