Interpolate silence in Discord.js stream

Question

I'm making a discord bot with Discord.js v14 that records users' audio as individual files and one collective file. As Discord.js streams do not interpolate silence, my question is how to interpolate silence into streams.

My code is based off the Discord.js recording example. In essence, a privileged user enters a voice channel (or stage), runs /record and all the users in that channel are recorded up until the point that they run /leave.

I've tried using Node packages like combined-stream, audio-mixer, multistream and multipipe, but I'm not familiar enough with Node streams to use the pros of each to fill in the gaps the cons add to the problem. I'm not entirely sure how to go about interpolating silence, either, whether it be through a Transform (likely requires the stream to be continuous, or for the receiver stream to be applied onto silence) or through a sort of "multi-stream" that swaps between piping the stream and a silence buffer. I also have yet to overlay the audio files (e.g, with ffmpeg).

Would it even be possible for a Readable to await an audio chunk and, if none is given within a certain timeframe, push a chunk of silence instead? My attempt at doing so is below (again, based off the Discord.js recorder example):

// CREDIT TO: https://stackoverflow.com/a/69328242/8387760
const SILENCE = Buffer.from([0xf8, 0xff, 0xfe]);

async function createListeningStream(connection, userId) {
    // Creating manually terminated stream
    let receiverStream = connection.receiver.subscribe(userId, {
        end: {
            behavior: EndBehaviorType.Manual
        },
    });
    
    // Interpolating silence
    // TODO Increases file length over tenfold by stretching audio?
    let userStream = new Readable({
        read() {
            receiverStream.on('data', chunk => {
                if (chunk) {
                    this.push(chunk);
                }
                else {
                    // Never occurs
                    this.push(SILENCE);
                }
            });
        }
    });
    
    /* Piping userStream to file at 48kHz sample rate */
}

As an unnecessary bonus, it would help if it were possible to check whether a user ever spoke or not to eliminate creating empty recordings. Thanks in advance.

score 1 · Answer 1 · answered Dec 13 '22 at 16:05

After a lot of reading about Node streams, the solution I procured was unexpectedly simple.

Create a boolean variable recording that is true when the recording should continue and false when it should stop
Create a buffer to handle backpressuring (i.e, when data is input at a higher rate than its output)

let buffer = [];

Create a readable stream for which the receiving user audio stream is piped into

// New audio stream (with silence)
let userStream = new Readable({
    // ...
});

// User audio stream (without silence)
let receiverStream = connection.receiver.subscribe(userId, {
    end: {
        behavior: EndBehaviorType.Manual,
    },
});
receiverStream.on('data', chunk => buffer.push(chunk));

In that stream's read method, handle stream recording with a 48kHz timer to match the sample rate of the user audio stream

read() {
   if (recording) {
        let delay = new NanoTimer();
        delay.setTimeout(() => {
            if (buffer.length > 0) {
                this.push(buffer.shift());
            }
            else {
                this.push(SILENCE);
            }
        }, '', '20m');
    }
    // ...
}

In the same method, also handle ending the stream

        // ...
        else if (buffer.length > 0) {
            // Stream is ending: sending buffered audio ASAP
            this.push(buffer.shift());
        }
        else {
            // Ending stream
            this.push(null);
        }

If we put it all together:

const NanoTimer = require('nanotimer'); // node
/* import NanoTimer from 'nanotimer'; */ // es6

const SILENCE = Buffer.from([0xf8, 0xff, 0xfe]);

async function createListeningStream(connection, userId) {
    // Accumulates very, very slowly, but only when user is speaking: reduces buffer size otherwise
    let buffer = [];
    
    // Interpolating silence into user audio stream
    let userStream = new Readable({
        read() {
            if (recording) {
                // Pushing audio at the same rate of the receiver
                // (Could probably be replaced with standard, less precise timer)
                let delay = new NanoTimer();
                delay.setTimeout(() => {
                    if (buffer.length > 0) {
                        this.push(buffer.shift());
                    }
                    else {
                        this.push(SILENCE);
                    }
                    // delay.clearTimeout();
                }, '', '20m'); // A 20.833ms period makes for a 48kHz frequency
            }
            else if (buffer.length > 0) {
                // Sending buffered audio ASAP
                this.push(buffer.shift());
            }
            else {
                // Ending stream
                this.push(null);
            }
        }
    });
    
    // Redirecting user audio to userStream to have silence interpolated
    let receiverStream = connection.receiver.subscribe(userId, {
        end: {
            behavior: EndBehaviorType.Manual, // Manually closed elsewhere
        },
        // mode: 'pcm',
    });
    receiverStream.on('data', chunk => buffer.push(chunk));
    
    // pipeline(userStream, ...), etc.
}

From here, you can pipe that stream into a fileWriteStream, etc. for individual purposes. Note that it's a good idea to also close the receiverStream whenever recording = false with something like:

connection.receiver.subscriptions.delete(userId);

As well, the userStream should, too be closed if it's not, e.g, the first argument of the pipeline method.

As a side note, although outside the scope of my original question, there are many other modifications you can make to this. For instance, you can prepend silence to the audio before piping the receiverStream's data to the userStream, e.g, to make multiple audio streams of the same length:

// let startTime = ...
let creationTime;
for (let i = startTime; i < (creationTime = Date.now()); i++) {
    buffer.push(SILENCE);
}

Happy coding!

In theory, this works, but the actual timing is off by a substantial unknown amount. Still looking for a workaround — Vessel, Jul 08 '23 at 02:40

Interpolate silence in Discord.js stream

1 Answers1