How can I prevent breakup/choppiness/glitches when using an AudioWorklet to stream captured audio?

Question

We've been working on a JavaScript-based audio chat client that runs in the browser and sends audio samples to a server via a WebSocket. We previously tried using the Web Audio API's ScriptProcessorNode to obtain the sample values. This worked well on our desktops and laptops, but we experienced poor audio quality when transmitting from a handheld platform we must support. We've attributed this to the documented script processor performance issues (https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API). On the handheld, with a script processor buffer size of 2048, audio consistently had breakups. At the next highest size interval (4096), the audio was smooth (no breakups), but there was too much latency (around two seconds).

Our results from ScriptProcessorNode prompted experimentation with Audio Worklet. Unfortunately, with our worklet implementation, audio quality is worse: both breakups and latency, even on our laptops. I'm wondering if there's a way to tweak our worklet implementation to get better performance, or if what we're experiencing is to be expected from (is "par for the course" for) the current state of audio worklets (Chromium issues 796330, 813825, and 836306 seem relevant).

Here's a little more detail on what the code does:

Create a MediaStreamStreamSourceNode with the MediaStream obtained from getUserMedia.
Connect the source node to our worklet node implementation (extends AudioWorkletNode).
Our worklet processor implementation (extends AudioWorkletProcessor) buffers blocks that arrive as the "input" argument to its process method.
When buffer is full, use MessagePort to send the buffer contents to the worklet node.
Worklet node transmits the buffer contents over a WebSocket connection.

The process method is below. The var "samples" is a Float32Array, which gets initialized to the buffer size and reused. I've experimented with buffer size a bit, but it doesn't seem to have an impact. The approach is based on the guidance in section 4.1 of AudioWorklet: The future of web audio to minimize memory allocations.

if (micKeyed == true) {

    if (inputs[0][0].length == framesPerBlock) {
        samples.set(inputs[0][0], currentBlockIndex * framesPerBlock);
        currentBlockIndex++;

        if (currentBlockIndex == lastBlockIndex) {
            // console.log('About to send buffer.');
            this.port.postMessage(samples);
            currentBlockIndex = 0;
        }
    } else {
        console.error("Got a block of unexpected length!!!");
    }
}
return true;

Currently testing with PCs running Chrome 72.0.3626.109 on CentOS 7. Our handhelds are Panasonic FZ-N1 running Chrome 72.0.3626.105 on Android 6.0.1.

Thank you for reading and any suggestions you may be able to provide.

Have you tried using `MediaSource`? See also [HTML5 audio streaming: precisely measure latency?](https://stackoverflow.com/questions/38768375/html5-audio-streaming-precisely-measure-latency); [How to use Blob URL, MediaSource or other methods to play concatenated Blobs of media fragments?](https://stackoverflow.com/questions/45217962/how-to-use-blob-url-mediasource-or-other-methods-to-play-concatenated-blobs-of) — guest271314, Feb 20 '19 at 19:43
Also [Method for streaming data from browser to server via HTTP](https://stackoverflow.com/q/35899536/); [Node.js: splitting stream content for n-parts](https://stackoverflow.com/q/43631320/) — guest271314, Feb 20 '19 at 19:49
Thanks for the response @guest271314. I haven't tried MediaSource, but it looks interesting. In our case however, I believe the source of the latency and breakups is in the capture/transmit pipeline within the client, not the playback/receive pipeline (the one to which MediaSource seems applicable). We've heard reasonably good quality playback when audio is captured and transmitted by a client running on relatively powerful hardware. When we keep the hardware constant for the playback client but switch to less powerful hardware for the capture client, the playback quality becomes unacceptable. — StephenL, Feb 20 '19 at 20:51
How do you contend to address disparate hardware in client-side code? — guest271314, Feb 20 '19 at 21:43
@guest271314: In our testing, we're hearing pitch shifts when transmitting client is handheld and receiving client is PC, or vice versa, ostensibly due to the different sampling rates (48000 samples/sec on PCs and 16000 samples/sec on handheld). Due to a Chrome bug, it seems these can't be changed. So, if we can get our current solution to work in all other regards, we'll have to do some resampling to account for the different rates. I'm not aware of any of any other issues resulting from disparate hardware. Did you have something else in mind? Thanks. — StephenL, Feb 20 '19 at 22:22
One possible option could be to create a custom version of Chromium where the source code (ostensibly FOSS) is adjusted to sample at N samples/sec at the target handhelds. Test the configuration at target devices in-house before deploying the custom browser at large or to prospective users of the specific application. — guest271314, Feb 20 '19 at 22:47

How can I prevent breakup/choppiness/glitches when using an AudioWorklet to stream captured audio?

0 Answers0

Linked