11

I'm failing to be able to play audio when making an "AJAX" request to my server side api.

I have backend Node.js code that's using IBM's Watson Text-to-Speech service to serve audio from text:

var render = function(request, response) {
    var options = {
        text: request.params.text,
        voice: 'VoiceEnUsMichael',
        accept: 'audio/ogg; codecs=opus'
    };

    synthesizeAndRender(options, request, response);
};

var synthesizeAndRender = function(options, request, response) {
    var synthesizedSpeech = textToSpeech.synthesize(options);

    synthesizedSpeech.on('response', function(eventResponse) {
        if(request.params.text.download) {
            var contentDisposition = 'attachment; filename=transcript.ogg';

            eventResponse.headers['content-disposition'] = contentDisposition;
        }
    });

    synthesizedSpeech.pipe(response);
};

I have client side code to handle that:

var xhr = new XMLHttpRequest(),
    audioContext = new AudioContext(),
    source = audioContext.createBufferSource();

module.controllers.TextToSpeechController = {
    fetch: function() {
        xhr.onload = function() {
            var playAudio = function(buffer) {
                source.buffer = buffer;
                source.connect(audioContext.destination);

                source.start(0);
            };

            // TODO: Handle properly (exiquio)
            // NOTE: error is being received
            var handleError = function(error) {
                console.log('An audio decoding error occurred');
            }

            audioContext
                .decodeAudioData(xhr.response, playAudio, handleError);
        };
        xhr.onerror = function() { console.log('An error occurred'); };

        var urlBase = 'http://localhost:3001/api/v1/text_to_speech/';
        var url = [
            urlBase,
            'test',
        ].join('');

        xhr.open('GET', encodeURI(url), true);
        xhr.setRequestHeader('x-access-token', Application.token);
        xhr.responseType = 'arraybuffer';
        xhr.send();
    }
}

The backend returns the audio that I expect, but my success method, playAudio, is never called. Instead, handleError is always called and the error object is always null.

Could anyone explain what I'm doing wrong and how to correct this? It would be greatly appreciated.

Thanks.

NOTE: The string "test" in the URL becomes a text param on the backend and and ends up in the options variable in synthesizeAndRender.

exiquio
  • 473
  • 2
  • 5
  • 12
  • Are you sure the audio format is supported? – Musa May 19 '15 at 16:20
  • I believe it must be. I originally tested the same backend code directly with the same Chrome browser via a url and it would play fine. – exiquio May 19 '15 at 16:49
  • Actually, the test was done on Chromium and Gnu/Linux. I believe it should be the same with Chrome in OSX where I am writing this code now, but I am not certain. – exiquio May 19 '15 at 17:00
  • UPDATE: I've run the folliwing query in the same brower I'm using to develop this code: http://localhost:3001/api/v1/text_to_speech/this%20is%20a%20test <-- This was done with my authentication code commented out and it rendered a builtin audio player and played the expected audio. Now I can say with certainty that the audio type is accepted. My only guess at my problem is the how I'm doing the headers on the server side above. The attachment part strikes me as potentially an issue. – exiquio May 20 '15 at 00:01

1 Answers1

15

Unfortunately, unlike Chrome's HTML5 Audio implementation, Chrome's Web Audio doesn't support audio/ogg;codecs=opus, which is what your request uses here. You need to set the format to audio/wav for this to work. To be sure it's passed through to the server request, I suggest putting it in the query string (accept=audio/wav, urlencoded).

Are you just looking to play the audio, or do you need access to the Web Audio API for audio transformation? If you just need to play the audio, I can show you how to easily play this with the HTML5 Audio API (not the Web Audio one). And with HTML5 Audio, you can stream it using the technique below, and you can use the optimal audio/ogg;codecs=opus format.

It's as simple as dynamically setting the source of your audio element, queried from the DOM via something like this:

(in HTML)

<audio id="myAudioElement" />

(in your JS)

var audio = document.getElementById('myAudioElement') || new Audio();
audio.src = yourUrl;

Your can also set the audio element's source via an XMLHttpRequest, but you won't get the streaming. But since you can use a POST method, you're not limited to the text length of a GET request (for this API, ~6KB). To set it in xhr, you create a data uri from a blob response:

    xhr.open('POST', encodeURI(url), true);
    xhr.setRequestHeader('Content-Type', 'application/json');
    xhr.responseType = 'blob';
    xhr.onload = function(evt) {
      var blob = new Blob([xhr.response], {type: 'audio/ogg'});
      var objectUrl = URL.createObjectURL(blob);
      audio.src = objectUrl;
      // Release resource when it's loaded
      audio.onload = function(evt) {
        URL.revokeObjectURL(objectUrl);
      };
      audio.play();
    };
    var data = JSON.stringify({text: yourTextToSynthesize});
    xhr.send(data);

As you can see, with XMLHttpRequest, you have to wait until the data are fully loaded to play. There may be a way to stream from XMLHttpRequest using the very new Media Source Extensions API, which is currently available only in Chrome and IE (no Firefox or Safari). This is an approach I'm currently experimenting with. I'll update here if I'm successful.

rajephon
  • 3
  • 1
  • 3
Eric S. Bullington
  • 1,001
  • 1
  • 11
  • 18
  • Eric answered my question with the statement about compatibility and a link to the Chromium issue and elaborated on possible work around which is greatly appreciated. – exiquio May 27 '15 at 00:02
  • Am struggling from past 2 days. Could you please look on this http://stackoverflow.com/questions/32163749/ – Dan Aug 23 '15 at 08:11
  • AAC format will work in all browsers, BTW. You're not limited to using WAV (which is huge). – ffxsam Jul 01 '16 at 05:46