2

I am using the Juce framework to build a VST/AU audio plugin. The audio plugin accepts MIDI, and renders that MIDI as audio samples — by sending the MIDI messages to be processed by FluidSynth (a soundfont synthesizer).

This is almost working. MIDI messages are sent to FluidSynth correctly. Indeed, if the audio plugin tells FluidSynth to render MIDI messages directly to its audio driver — using a sine wave soundfont — we achieve a perfect result:

Perfect sine wave, by sending audio direct to driver

But I shouldn't ask FluidSynth to render directly to the audio driver. Because then the VST host won't receive any audio.

To do this properly: I need to implement a renderer. The VST host will ask me (44100÷512) times per second to render 512 samples of audio.


I tried rendering blocks of audio samples on-demand, and outputting those to the VST host's audio buffer, but this is the kind of waveform I got:

Rendering blocks of audio, poorly

Here's the same file, with markers every 512 samples (i.e. every block of audio):

with markers

So, clearly I'm doing something wrong. I am not getting a continuous waveform. Discontinuities are very visible between each blocks of audio that I process.


Here's the most important part of my code: my implementation of JUCE's SynthesiserVoice.

#include "SoundfontSynthVoice.h"
#include "SoundfontSynthSound.h"

SoundfontSynthVoice::SoundfontSynthVoice(const shared_ptr<fluid_synth_t> synth)
: midiNoteNumber(0),
synth(synth)
{}

bool SoundfontSynthVoice::canPlaySound(SynthesiserSound* sound) {
    return dynamic_cast<SoundfontSynthSound*> (sound) != nullptr;
}
void SoundfontSynthVoice::startNote(int midiNoteNumber, float velocity, SynthesiserSound* /*sound*/, int /*currentPitchWheelPosition*/) {
    this->midiNoteNumber = midiNoteNumber;
    fluid_synth_noteon(synth.get(), 0, midiNoteNumber, static_cast<int>(velocity * 127));
}

void SoundfontSynthVoice::stopNote (float /*velocity*/, bool /*allowTailOff*/) {
    clearCurrentNote();
    fluid_synth_noteoff(synth.get(), 0, this->midiNoteNumber);
}

void SoundfontSynthVoice::renderNextBlock (
    AudioBuffer<float>& outputBuffer,
    int startSample,
    int numSamples
    ) {
    fluid_synth_process(
        synth.get(),    // fluid_synth_t *synth //FluidSynth instance
        numSamples,     // int len //Count of audio frames to synthesize
        1,              // int nin //ignored
        nullptr,        // float **in //ignored
        outputBuffer.getNumChannels(), // int nout //Count of arrays in 'out' 
        outputBuffer.getArrayOfWritePointers() // float **out //Array of arrays to store audio to
        );
}

This is where each voice of the synthesizer is asked to produce the block of 512 audio samples.

The important function here is SynthesiserVoice::renderNextBlock(), wherein I ask fluid_synth_process() to produce a block of audio samples.


And here's the code that tells every voice to renderNextBlock(): my implementation of AudioProcessor.

AudioProcessor::processBlock() is the main loop of the audio plugin. Within it, Synthesiser::renderNextBlock() invokes every voice's SynthesiserVoice::renderNextBlock():

void LazarusAudioProcessor::processBlock (
    AudioBuffer<float>& buffer,
    MidiBuffer& midiMessages
    ) {
    jassert (!isUsingDoublePrecision());
    const int numSamples = buffer.getNumSamples();

    // Now pass any incoming midi messages to our keyboard state object, and let it
    // add messages to the buffer if the user is clicking on the on-screen keys
    keyboardState.processNextMidiBuffer (midiMessages, 0, numSamples, true);

    // and now get our synth to process these midi events and generate its output.
    synth.renderNextBlock (
        buffer,       // AudioBuffer<float> &outputAudio
        midiMessages, // const MidiBuffer &inputMidi
        0,            // int startSample
        numSamples    // int numSamples
        );

    // In case we have more outputs than inputs, we'll clear any output
    // channels that didn't contain input data, (because these aren't
    // guaranteed to be empty - they may contain garbage).
    for (int i = getTotalNumInputChannels(); i < getTotalNumOutputChannels(); ++i)
        buffer.clear (i, 0, numSamples);
}

Is there something I'm misunderstanding here? Is there some timing subtlety required to make FluidSynth give me samples that are back-to-back with the previous block of samples? Maybe an offset that I need to pass in?

Maybe FluidSynth is stateful, and has its own clock that I need to gain control of?

Is my waveform symptomatic of some well-known problem?

Source code is here, in case I've omitted anything important. Question posted at the time of commit 95605.

Birchlabs
  • 7,437
  • 5
  • 35
  • 54

1 Answers1

3

As I wrote the final paragraph, I realised:

fluid_synth_process() provides no mechanism for specifying timing information or sample offset. Yet we observe that time advances nevertheless (each block is different), so the simplest explanation is: the FluidSynth instance begins at time 0, and advances by numSamples*sampleRate seconds every time fluid_synth_process() is invoked.

This leads to the revelation: since fluid_synth_process() has side-effects upon the FluidSynth instance's timing: it is dangerous for multiple voices to run this upon the same synth instance.

I tried reducing const int numVoices = 8; to const int numVoices = 1;. So only one agent would invoke fluid_synth_process() per block.

This fixed the problem; it produced a perfect waveform, and revealed the source of the discontinuity.

So, I'm left now with a much easier problem of "what's the best way to synthesize a plurality of voices in FluidSynth". This is a much nicer problem to have. That's outside of the scope of this question, and I'll investigate it separately. Thanks for your time!

EDIT: fixed the multiple voices. I did this by making SynthesiserVoice::renderNextBlock() a no-op, and moving its fluid_synth_process() into AudioProcessor::processBlock() instead — because it should be invoked once per block (not once per voice per block).

Birchlabs
  • 7,437
  • 5
  • 35
  • 54
  • 1
    Nice job. :) In fact, it is a bit surprising that FluidSynth is this heavy stateless. Does it perform much of a filtering with delays? – bipll Sep 12 '17 at 05:47
  • @bipll thanks. :) I'm not sure I understand your question. are you asking whether fluidsynth can render audio quickly enough to fill the buffer? certainly it can render 512 samples 44100 times per second; the audio sounds great. or are you asking if it can apply filters quickly enough to fill the buffer? I haven't tried out chorus and reverb yet. or are you asking whether it has a "delay" filter? I think it does not. are you asking whether the synthesis is low-latency? I guess the only latency that matters is "how quickly can it process midiNoteOn". I haven't measured that latency, though. – Birchlabs Sep 12 '17 at 10:29
  • `s/stateless/stateful/` >_<. I'm just curious that fluid_synth_process does not even have a startSample argument and merely returns the wavedata sequentially block by block. This probably has something to do with its inner filters that use delays. – bipll Sep 12 '17 at 21:43
  • "what's the best way to synthesize a plurality of voices in FluidSynth" IIRC Csound solves this last issue by allowing one to instantiate multiple fluidEngines. http://www.csounds.com/manual/html/fluidEngine.html – Fizz Jan 27 '22 at 02:21