Web Audio API for live streaming?

Question

We need to streaming live audio (from a medical device) to web browsers with no more than 3-5s of end-to-end delay (assume 200mS or less network latency). Today we use a browser plugin (NPAPI) for decoding, filtering (high, low, band), and playback of the audio stream (delivered via Web Sockets).

We want to replace the plugin.

I was looking at various Web Audio API demos and the most of our required functionality (playback, gain control, filtering) appears to be available in Web Audio API. However, it is not clear to me if Web Audio API can be used for streamed sources as most of the Web Audio API makes use of short sounds and/or audio clips.

Can Web Audio API be used to play live streamed audio?

Update (11-Feb-2015):

After a bit more research and local prototyping, I am not sure live audio streaming with Web Audio API is possible. As Web Audio API's decodeAudioData isn't really designed to handle random chunks of audio data (in our case delivered via WebSockets). It appears to need the whole 'file' in order to process it correctly.

See stackoverflow:

Now it is possible with createMediaElementSource to connect an <audio> element to Web Audio API, but it has been my experience that the <audio> element induces a huge amount of end-to-end delay (15-30s) and there doesn't appear to be any means to reduce the delay to below 3-5 seconds.

I think the only solution is to use WebRTC with Web Audio API. I was hoping to avoid WebRTC as it will require significant changes to our server-side implementation.

Update (12-Feb-2015) Part I:

I haven't completely eliminated the <audio> tag (need to finish my prototype). Once I have ruled it out, I suspect the createScriptProcessor (deprecated but still supported) will be a good choice for our environment as I could 'stream' (via WebSockets) our ADPCM data to the browser and then (in JavaScript) convert it to PCM. Similar to what to Scott's library (see below) does using the createScriptProcessor. This method doesn't require the data to be in properly sized 'chunks' and critical timing as the decodeAudioData approach.

Update (12-Feb-2015) Part II:

After more testing, I eliminated the <audio> to Web Audio API interface because, depending on source type, compression and browser, the end-to-end delay can be 3-30s. That leaves the createScriptProcessor method (See Scott's post below) or WebRTC. After talking discussing with our decision makers, it has been decided we will take the WebRTC approach. I assume it will work. But it will require changes to our server side code.

I'm going to mark the first answer, just so the 'question' is closed.

Thanks for listening. Feel free to add comments as needed.

Audio, as any other type of data on a computer, is just a bunch of bytes. Send those bytes over a network and you have streaming. Anything can be streamed (as long as you can send data faster, or as fast, as it is generated). And to answer your question, you can also convert the received bytes in an audio file and play it, using Web Audio API or whatever you prefer. You can use if you want WebRTC and stream and play the data directly: https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamAudioSourceNode — XCS, Feb 10 '15 at 19:36
There are tons of references to streaming in the API doc: http://webaudio.github.io/web-audio-api/#the-mediastreamaudiosourcenode-interface Besides, why not just download and try it? Might be quicker than waiting for an answer here. — Paul Sasik, Feb 10 '15 at 19:38
@Cristy (and Paul): Both of you mentioned the Web Audio API MediaStreamAudioSourceNode method. From what little I've read it seems to be intended to be used to "redirect"(?) an — Tony, Feb 10 '15 at 19:59
yes, it could. Use WebRTC to hook an audio stream up to a media element, and then hook the media element into Web Audio. — cwilso, Feb 11 '15 at 01:23
Re: live audio streaming – your server would need to encode each chunk as MP3. This is basically how all streaming works. You make a bunch of small, digestable chunks and then send them to the client, where they can be individually decoded and added to a queue. — Kevin Ennis, Feb 11 '15 at 21:26
@notthetup -- I just ran a test in which (from the web server) I slowly streamed a 345 second long (360K bytes) ogg/opus file at a rate of 1080 bytes/s. Chrome will wait 30s before playback occurs, whereas Firefox will start playing after ~4s. I wish we had a bit more control over audio tag, but unfortunately, we do not. — Tony, Feb 12 '15 at 16:17
Can someone help me with on how to createMediaElementSource to connect an — Abdul Hannan, Apr 15 '20 at 11:06

Kevin Ennis · Accepted Answer · 2015-02-10T21:01:02.803

13

Yes, the Web Audio API (along with AJAX or Websockets) can be used for streaming.

Basically, you pull down (or send, in the case of Websockets) some chunks of n length. Then you decode them with the Web Audio API and queue them up to be played, one after the other.

Because the Web Audio API has high-precision timing, you won't hear any "seams" between the playback of each buffer if you do the scheduling correctly.

edited Feb 10 '15 at 21:01

answered Feb 10 '15 at 19:39

Kevin Ennis

14,226
2
43
44

3

I assume if the buffers arrive late (network hiccup) some clipping (gaps) would be heard, correct? And to combat that, I could "pre-queue" X bytes (or seconds) of data before starting, playback, correct? – Tony Feb 10 '15 at 19:52
3

yeah, exactly. nothing you can do about that. if the server can't send data as fast as you need to read it, there's gonna be silence. unless your app can time travel, which would be an awesome feature. and yeah, it's very typical to build up a buffer of some semi-arbitrary length before beginning playback so that you've got some leeway for network lag. – Kevin Ennis Feb 10 '15 at 21:00
2

any ideas how I can queue up 'chunks' of data into a Web Audio API 'source'? It doesn't seem possible (see my recent updates to the original post). – Tony Feb 11 '15 at 15:21
4

Yeah, so... you have to create a new AudioBufferSourceNode per chunk. That's why the scheduling is important (because you're actually going to be playing a bunch of "separate" buffers ). Basically, you push all of your chunks to an array – then you `shift()` to grab the earliest buffer and add it to a new `AudioBufferSourceNode`. – Kevin Ennis Feb 11 '15 at 15:45
"Basically, you pull down some chunks of `n` length." How is that possible, since `XMLHttpRequest` doesn't seem to have any mechanism for specifying how many bytes to download? – ffxsam Jun 20 '17 at 19:36
@KevinEnnis Thanks for the tip; see my [answer](https://stackoverflow.com/a/62870119/1599699) for a working implementation. – Andrew Jul 13 '20 at 06:13

Scott Stensland · Answer 2 · 2016-09-29T03:08:31.367

6

I wrote a streaming Web Audio API system where I used web workers to do all the web socket management to communicate with node.js such that the browser thread simply renders audio ... works just fine on laptops, since mobiles are behind on their implementation of web sockets inside web workers you need no less than lollipop for it to run as coded ... I posted full source code here

edited Sep 29 '16 at 03:08

answered Feb 11 '15 at 16:21

Scott Stensland

26,870
12
93
104

1

Thanks. If I'm reading it correctly, you queue blocks of float-PCM data (delivered via Websockets) and leverage AudioContext's createScriptProcessor to feed the queued data into Web Audio API, correct? Sending raw WAV data to the browser requires quite a bit of bandwidth. I'll look into this approach further. We can receive ADPCM or Opus data via websocket. I was hoping I could get Web Audio to accept either but it seems (if I want to stream), I'll need to convert it myself rather than decodeAudioData. – Tony Feb 11 '15 at 19:08
1

I just noticed that createScriptProcessor is DEPRECATED. It has been replaced by AudioWorkerNode. More digging required... – Tony Feb 11 '15 at 19:17
Even though createScriptProcessor has been marked as deprecated in the Web Audio API spec, it seems createAudioWorker has NOT been implemented in Chrome (as of 40.0.2214.111) or Firefox (as of 35.0.1). – Tony Feb 12 '15 at 14:10
I find it rather pointless to target mobile for any new technology especially Android as it takes many many many years lag before enough people buy new kit seeings how the current ecosystem fails to refresh current OS releases onto prior generation phones ... unless you have a super cool app and target people who care about the bleed ... ;-) – Scott Stensland Feb 12 '15 at 18:48
true. But we weren't targeting mobile. – Tony Feb 12 '15 at 20:00

Jan Swart · Answer 3 · 2016-10-23T18:40:54.590

3

To elaborate on the comments on how to play a bunch of separate buffers stored in an array by shifting the latest one out everytime:

If you create a buffer through createBufferSource() then it has an onended event to which you can attach a callback, which will fire when the buffer has reached its end. You can do something like this to play the various chunks in the array one after the other:

function play() {
  //end of stream has been reached
  if (audiobuffer.length === 0) { return; }
  let source = context.createBufferSource();

  //get the latest buffer that should play next
  source.buffer = audiobuffer.shift();
  source.connect(context.destination);

  //add this function as a callback to play next buffer
  //when current buffer has reached its end 
  source.onended = play;
  source.start();
}

Hope that helps. I'm still experimenting on how to get this all smooth and ironed out, but this is a good start and missing in a lot of the online posts.

edited Oct 23 '16 at 18:40

answered Oct 23 '16 at 18:29

Jan Swart

6,761
10
36
45

3

The only problem with this is that it still has very tiny little jumps between each buffer being play. It's very small, but the irritation is still there.. – Jan Swart Oct 23 '16 at 19:18
1

This what I implemented, but I also have these inter-buffer lags. Any chance to solve that? [I've asked a dedicated question about that](https://stackoverflow.com/questions/43366627/cracks-in-webaudio-playback-during-streaming-of-raw-audio-data) – Ploppe Apr 12 '17 at 13:58
Warning: Putting an audio buffer on loop does not mean that you can change the data in that buffer live... – Andrew Jul 13 '20 at 03:55
Update: See my [answer](https://stackoverflow.com/a/62870119/1599699), which fixes this problem. – Andrew Jul 13 '20 at 06:14

Andrew · Answer 4 · 2020-07-13T06:38:27.410

You have to create a new ~~AudioBuffer and~~ AudioBufferSourceNode both (or at least the latter) for every piece of data that you want to buffer... I tried looping the same AudioBuffer, but once you set .audioBuffer on the AudioContext, any modifications you make to the AudioBuffer become irrelevant.

(NOTE: These classes have base/parent classes you should look at as well (referenced in the docs).)

Here's my preliminary solution that I got working (forgive me for not feeling like commenting everything, after spending hours just getting this working), and it works beautifully:

class MasterOutput {
  constructor(computeSamplesCallback) {
    this.computeSamplesCallback = computeSamplesCallback.bind(this);
    this.onComputeTimeoutBound = this.onComputeTimeout.bind(this);

    this.audioContext = new AudioContext();
    this.sampleRate = this.audioContext.sampleRate;
    this.channelCount = 2;

    this.totalBufferDuration = 5;
    this.computeDuration = 1;
    this.bufferDelayDuration = 0.1;

    this.totalSamplesCount = this.totalBufferDuration * this.sampleRate;
    this.computeDurationMS = this.computeDuration * 1000.0;
    this.computeSamplesCount = this.computeDuration * this.sampleRate;
    this.buffersToKeep = Math.ceil((this.totalBufferDuration + 2.0 * this.bufferDelayDuration) /
      this.computeDuration);

    this.audioBufferSources = [];
    this.computeSamplesTimeout = null;
  }

  startPlaying() {
    if (this.audioBufferSources.length > 0) {
      this.stopPlaying();
    }

    //Start computing indefinitely, from the beginning.
    let audioContextTimestamp = this.audioContext.getOutputTimestamp();
    this.audioContextStartOffset = audioContextTimestamp.contextTime;
    this.lastTimeoutTime = audioContextTimestamp.performanceTime;
    for (this.currentBufferTime = 0.0; this.currentBufferTime < this.totalBufferDuration;
      this.currentBufferTime += this.computeDuration) {
      this.bufferNext();
    }
    this.onComputeTimeoutBound();
  }

  onComputeTimeout() {
    this.bufferNext();
    this.currentBufferTime += this.computeDuration;

    //Readjust the next timeout to have a consistent interval, regardless of computation time.
    let nextTimeoutDuration = 2.0 * this.computeDurationMS - (performance.now() - this.lastTimeoutTime) - 1;
    this.lastTimeoutTime = performance.now();
    this.computeSamplesTimeout = setTimeout(this.onComputeTimeoutBound, nextTimeoutDuration);
  }

  bufferNext() {
    this.currentSamplesOffset = this.currentBufferTime * this.sampleRate;

    //Create an audio buffer, which will contain the audio data.
    this.audioBuffer = this.audioContext.createBuffer(this.channelCount, this.computeSamplesCount,
      this.sampleRate);

    //Get the audio channels, which are float arrays representing each individual channel for the buffer.
    this.channels = [];
    for (let channelIndex = 0; channelIndex < this.channelCount; ++channelIndex) {
      this.channels.push(this.audioBuffer.getChannelData(channelIndex));
    }

    //Compute the samples.
    this.computeSamplesCallback();

    //Creates a lightweight audio buffer source which can be used to play the audio data. Note: This can only be
    //started once...
    let audioBufferSource = this.audioContext.createBufferSource();
    //Set the audio buffer.
    audioBufferSource.buffer = this.audioBuffer;
    //Connect it to the output.
    audioBufferSource.connect(this.audioContext.destination);
    //Start playing when the audio buffer is due.
    audioBufferSource.start(this.audioContextStartOffset + this.currentBufferTime + this.bufferDelayDuration);
    while (this.audioBufferSources.length >= this.buffersToKeep) {
      this.audioBufferSources.shift();
    }
    this.audioBufferSources.push(audioBufferSource);
  }

  stopPlaying() {
    if (this.audioBufferSources.length > 0) {
      for (let audioBufferSource of this.audioBufferSources) {
        audioBufferSource.stop();
      }
      this.audioBufferSources = [];
      clearInterval(this.computeSamplesTimeout);
      this.computeSamplesTimeout = null;
    }
  }
}

window.onload = function() {
  let masterOutput = new MasterOutput(function() {
    //Populate the audio buffer with audio data.
    let currentSeconds;
    let frequency = 220.0;
    for (let sampleIndex = 0; sampleIndex <= this.computeSamplesCount; ++sampleIndex) {
      currentSeconds = (sampleIndex + this.currentSamplesOffset) / this.sampleRate;

      //For a sine wave.
      this.channels[0][sampleIndex] = 0.005 * Math.sin(currentSeconds * 2.0 * Math.PI * frequency);

      //Copy the right channel from the left channel.
      this.channels[1][sampleIndex] = this.channels[0][sampleIndex];
    }
  });
  masterOutput.startPlaying();
};

Some details:

You can create multiple MasterOutput's and play multiple simultaneous things this way; though, you may possibly want to extract the AudioContext out of there and just share 1 amongst all your code.
This code sets up 2 channels (L + R) with the default sample rate from the AudioContext (48000 for me).
This code buffers a total of 5 seconds in advance, computing 1 second of audio data at a time, and delaying the playing and stopping of audio both by 0.1 seconds. It also keeps track of all of the audio buffer sources in case it needs to stop them if the output is to be paused; these are put into a list, and when they should be expired (that is, they no longer need to be stop()ped), they're shift()ed out of the list.
Note how I use audioContextTimestamp, that's important. The contextTime property lets me know when exactly the audio was started (each time), and then I can use that time (this.audioContextStartOffset) later on when audioBufferSource.start() is called, in order to time every audio buffer to the exact right time it should be played.

Edit: Yep, I was right (in the comments)! You can reuse the expired AudioBuffers if wanted. This is in many cases going to be the more "proper" way to do things.

Here are the parts of the code that would have to change for that:

...
        this.audioBufferDatas = [];
        this.expiredAudioBuffers = [];
...
    }

    startPlaying() {
        if (this.audioBufferDatas.length > 0) {

...

    bufferNext() {
...
        //Create/Reuse an audio buffer, which will contain the audio data.
        if (this.expiredAudioBuffers.length > 0) {
            //console.log('Reuse');
            this.audioBuffer = this.expiredAudioBuffers.shift();
        } else {
            //console.log('Create');
            this.audioBuffer = this.audioContext.createBuffer(this.channelCount, this.computeSamplesCount,
                this.sampleRate);
        }

...

        while (this.audioBufferDatas.length >= this.buffersToKeep) {
            this.expiredAudioBuffers.push(this.audioBufferDatas.shift().buffer);
        }
        this.audioBufferDatas.push({
            source: audioBufferSource,
            buffer: this.audioBuffer
        });
    }

    stopPlaying() {
        if (this.audioBufferDatas.length > 0) {
            for (let audioBufferData of this.audioBufferDatas) {
                audioBufferData.source.stop();
                this.expiredAudioBuffers.push(audioBufferData.buffer);
            }
            this.audioBufferDatas = [];
...

Here was my starting code, if you want something simpler, and you don't need live audio streaming:

window.onload = function() {
  const audioContext = new AudioContext();
  const channelCount = 2;
  const bufferDurationS = 5;

  //Create an audio buffer, which will contain the audio data.
  let audioBuffer = audioContext.createBuffer(channelCount, bufferDurationS * audioContext.sampleRate,
    audioContext.sampleRate);

  //Get the audio channels, which are float arrays representing each individual channel for the buffer.
  let channels = [];
  for (let channelIndex = 0; channelIndex < channelCount; ++channelIndex) {
    channels.push(audioBuffer.getChannelData(channelIndex));
  }

  //Populate the audio buffer with audio data.
  for (let sampleIndex = 0; sampleIndex < audioBuffer.length; ++sampleIndex) {
    channels[0][sampleIndex] = Math.sin(sampleIndex * 0.01);
    channels[1][sampleIndex] = channels[0][sampleIndex];
  }

  //Creates a lightweight audio buffer source which can be used to play the audio data.
  let audioBufferSource = audioContext.createBufferSource();
  audioBufferSource.buffer = audioBuffer;
  audioBufferSource.connect(audioContext.destination);
  audioBufferSource.start();
};

Unfortunately this ^ particular code is no good for live audio, because it only uses 1 AudioBuffer and AudioBufferSourceNode, and like I said, turning looping on doesn't let you modify it... But, if all you want to do is play a sine wave for 5 seconds and then stop (or loop it (set to true and done)), this will do just fine.

I'm fairly certain a potential improvement btw would be to store those `AudioBuffer`s belonging to the `AudioBufferSourceNode`s which get `shift()`ed (when they're finished), and then reuse them with different `AudioBufferSourceNode`s later on. This may depend on what you're trying to do though; e.g., you may want to keep the data in the `AudioBuffer`s to play again at a later time, without changing it. — Andrew, Jul 13 '20 at 06:20

Web Audio API for live streaming?

4 Answers4

Linked