2

I'm in the process of finishing a MIDI controlled software synthesizer. The MIDI input and synthesis work alright, but I appear to have a problem one playing the audio itself.

I'm using jackd as my audio server because of the possibility to configure it for low latency applications, such as in my case, real-time MIDI instruments, with alsa as the jackd backend.

In my program I'm using RtAudio which is a fairly well known C++ library to connect to various sound servers and offers basic stream operations on them. As the name says, it's optimised for real-time audio.

I also use the Vc library, which is a library that provides vectorization for various math functions, in order to speed up the additive synthesis process. I'm basically adding up a multitude of sine waves of different frequencies and amplitudes in order the produce a complex waveform on the output, like a sawtooth wave or a square wave, for example.

Now, the problem is not that latency is high to start with, as that could probably be solved or blamed on a lot of things, such as MIDI input or what not. The problem is that latency between my soft synth and the final audio output starts very low, and after a couple of minutes, it gets unbearably high.

Since I plan to use this to play "live", i.e. in my home, I can't really be bothered to play with an ever-growing latency between my keystrokes and the audio feedback I hear.

I've tried to reduce the code base that reproduces the problem all the way down, and I can't further reduce it any more.

#include <queue>
#include <array>
#include <iostream>
#include <thread>
#include <iomanip>
#include <Vc/Vc>
#include <RtAudio.h>
#include <chrono>
#include <ratio>
#include <algorithm>
#include <numeric>


float midi_to_note_freq(int note) {
    //Calculate difference in semitones to A4 (note number 69) and use equal temperament to find pitch.
    return 440 * std::pow(2, ((double)note - 69) / 12);
}


const unsigned short nh = 64; //number of harmonics the synthesizer will sum up to produce final wave

struct Synthesizer {
    using clock_t = std::chrono::high_resolution_clock;


    static std::chrono::time_point<clock_t> start_time;
    static std::array<unsigned char, 128> key_velocities;

    static std::chrono::time_point<clock_t> test_time;
    static std::array<float, nh> harmonics;

    static void init();
    static float get_sample();
};


std::array<float, nh> Synthesizer::harmonics = {0};
std::chrono::time_point<std::chrono::high_resolution_clock> Synthesizer::start_time, Synthesizer::test_time;
std::array<unsigned char, 128> Synthesizer::key_velocities = {0};


void Synthesizer::init() { 
    start_time = clock_t::now();
}

float Synthesizer::get_sample() {

    float t = std::chrono::duration_cast<std::chrono::duration<float, std::ratio<1,1>>> (clock_t::now() - start_time).count();

    Vc::float_v result = Vc::float_v::Zero();

    for (int i = 0; i<key_velocities.size(); i++) {
        if (key_velocities.at(i) == 0) continue;
        auto v = key_velocities[i];
        float f = midi_to_note_freq(i);
        int j = 0;
        for (;j + Vc::float_v::size() <= nh; j+=Vc::float_v::size()) {
            Vc::float_v twopift = Vc::float_v::generate([f,t,j](int n){return 2*3.14159268*(j+n+1)*f*t;});
            Vc::float_v harms = Vc::float_v::generate([harmonics, j](int n){return harmonics.at(n+j);});
            result += v*harms*Vc::sin(twopift); 
        }
    }
    return result.sum()/512;
}                                                                                                


std::queue<float> sample_buffer;

int streamCallback (void* output_buf, void* input_buf, unsigned int frame_count, double time_info, unsigned int stream_status, void* userData) {
    if(stream_status) std::cout << "Stream underflow" << std::endl;
    float* out = (float*) output_buf;
    for (int i = 0; i<frame_count; i++) {
        while(sample_buffer.empty()) {std::this_thread::sleep_for(std::chrono::nanoseconds(1000));}
        *out++ = sample_buffer.front(); 
        sample_buffer.pop();
    }
    return 0;
}


void get_samples(double ticks_per_second) {
    double tick_diff_ns = 1e9/ticks_per_second;
    double tolerance= 1/1000;

    auto clock_start = std::chrono::high_resolution_clock::now();
    auto next_tick = clock_start + std::chrono::duration<double, std::nano> (tick_diff_ns);
    while(true) {
        while(std::chrono::duration_cast<std::chrono::duration<double, std::nano>>(std::chrono::high_resolution_clock::now() - next_tick).count() < tolerance) {std::this_thread::sleep_for(std::chrono::nanoseconds(100));}
        sample_buffer.push(Synthesizer::get_sample());
        next_tick += std::chrono::duration<double, std::nano> (tick_diff_ns);
    }
}


int Vc_CDECL main(int argc, char** argv) {
    Synthesizer::init();

    /* Fill the harmonic amplitude array with amplitudes corresponding to a sawtooth wave, just for testing */
    std::generate(Synthesizer::harmonics.begin(), Synthesizer::harmonics.end(), [n=0]() mutable {
            n++;
            if (n%2 == 0) return -1/3.14159268/n;
            return 1/3.14159268/n;
        });

    RtAudio dac;

    RtAudio::StreamParameters params;
    params.deviceId = dac.getDefaultOutputDevice();
    params.nChannels = 1;
    params.firstChannel = 0;
    unsigned int buffer_length = 32;

    std::thread sample_processing_thread(get_samples, std::atoi(argv[1]));
    std::this_thread::sleep_for(std::chrono::milliseconds(10));

    dac.openStream(&params, nullptr, RTAUDIO_FLOAT32, std::atoi(argv[1]) /*sample rate*/, &buffer_length /*frames per buffer*/, streamCallback, nullptr /*data ptr*/);

    dac.startStream();

    bool noteOn = false;
    while(true) {
        noteOn = !noteOn;
        std::cout << "noteOn = " << std::boolalpha << noteOn << std::endl;
        Synthesizer::key_velocities.at(65) = noteOn*127;
        std::this_thread::sleep_for(std::chrono::seconds(1));
    }

    sample_processing_thread.join();
    dac.stopStream();
}

To be compiled with g++ -march=native -pthread -o synth -Ofast main.cpp /usr/local/lib/libVc.a -lrtaudio

The program expects a sample rate as first argument. In my setup I use jackd -P 99 -d alsa -p 256 -n 3 & as my sound server (requires real-time priority permissions for the current user). Since the default sample rate for jackd is 48 kHz, I run the program with ./synth 48000.

alsa could be used as a sound server, though I prefer using jackd when possible for obscure reasons including pulseaudio and alsa interactions.

If you get to run the program at all, you should hear a hopefully not too annoying sawtooth wave playing and not playing at regular intervals, with console output on when the playing should start and stop. When noteOn is set to true, the synthesizer starts producing the sawtooth wave at whatever frequency, and stops when noteOn is set to false.

You'll hopefully see that at first, noteOn true and false correspond almost perfectly with the audio playing and stopping, but little by little, the audio source starts lagging behind until it starts to get very noticeable around 1 minute to 1 minute 30 seconds on my machine.

I'm 99% sure it has nothing to do with my program for the following reasons.

The "audio" takes this path through the program.

  • The key is pressed.

  • A clock ticks at 48 kHz in the sample_processing_thread and calls Synthesizer::get_sample and passes the output to an std::queue that is used as a sample buffer for later.

  • Whenever the RtAudio stream needs samples, it gets them from the sample buffer and moves along.

The only thing that could be a source of ever increasing latency here is the clock ticking, but it ticks at the same rate as the stream consumes samples, so that can't be it. If the clock ticked slower, RtAudio would complain about stream underruns and there would be noticeable audio corruption, which doesn't happen.

The clock could however click faster, but I don't think that's the case, as I've tested the clock by itself in numerous occasions, and while it does show a little bit of jitter, in the order of nanoseconds, this is to be expected. There is no cumulative latency to the clock itself.

Thus, the only possible source of growing latency would be internal functions of RtAudio or the sound server itself. I have google'd around for a bit and have found nothing of use.

I have been trying to solve this for a week or two now, and I've tested everything that could be going wrong on my side, and it works as expected, so I really don't know what could be happening.


What I have tried

  • Checking if the clock has cumulative latency of some sort: No cumulative latency has been noticed
  • Timing the delay between key presses and the first sample of audio being produced to see if this delay grew with time: Delay did not grow with time
  • Timing the delay between the stream asking for samples and the samples being sent to the stream (start and end of stream_callback): Delay did not grow with time
ChemiCalChems
  • 612
  • 12
  • 31
  • Please comment on why the mysterious down vote and I'll edit the question to hopefully make it any more solvable than it is right now. – ChemiCalChems Feb 18 '18 at 10:11

1 Answers1

2

I think your get_samples thread generates audio faster or slower than streamCallback consumes them. Using clock for flow control is unreliable.

Simple way to fix, remove that thread and sample_buffer queue and generate samples directly in streamCallback function.

If you do want to use multithreading for your app, it requires proper synchronization between producer and consumer. Much more complex. But in short, the steps are below.

  1. Replace your queue with a reasonably small fixed-length circular buffer. Technically, std::queue will work too, just slower because pointer-based, and you need to manually limit the max.size.

  2. In producer thread implement endless loop that checks is there empty space in the buffer, if there’s space generate more audio, if not, wait for the consumer to consume the data from the buffer.

  3. In consumer streamCallback callback, copy data from circular buffer to output_buf. If there’s not enough data available, wake the producer thread and wait for it to produce the data.

Unfortunately an efficient implementation of that is quite tricky. You need synchronization to protect shared data, but you don’t want too much synchronization otherwise producer and consumer will be serialized and will only use a single hardware thread. One approach is single std::mutex to protect the buffer while moving pointers/size/ofset (but unlock while reading/writing the data), and two std::condition_variable, one for the producer to sleep when there’s no free space in the buffer, another one for the consumer to sleep when there’s no data in the buffer.

Soonts
  • 20,079
  • 9
  • 57
  • 130
  • This could be it, however this arises another design problem. I can't simply generate sound samples whenever I want. I want to generate samples at a certain sample rate to better represent the actual wave form I'm synthesizing without corruption. I have tried generating samples when needed before, and it only produced unusable audio, which is worse than audio with growing latency. – ChemiCalChems Feb 19 '18 at 00:56
  • I guess my point is that I'm not reading from a static sound file, I'm "reading" the current state of my keyboard. I can't simply look into the future and generate say 256 samples at once, because they'd all represent roughly the same sample value, as they were all generated at roughly the same time. – ChemiCalChems Feb 19 '18 at 00:58
  • @ChemiCalChems Sample rate is usually fixed, typically it’s 48 kHz for home users, 96 or 192 kHz for professionals. – Soonts Feb 19 '18 at 01:07
  • Indeed you can’t look in the future. That’s why buffering is inevitable. streamCallback doesn’t ask you for individual frames, it asks you to supply `frame_count` frames at once. And then, there’s another layer of buffering in OS and hardware. You can’t have zero latency, but you can have small enough latency for your practical application. – Soonts Feb 19 '18 at 01:07
  • I know buffering is a necessity and what stream_callback is expected to do. The thing is I can't just produce samples when I'm asked for them. I guess a good analogy is being asked for the current weather, and have to give the final assignment at the end of the week. You have to note the current weather at regular intervals in order to observe changes in the weather, you can't simply say OH SHIT 5 minutes before the assignment is due, and make 50 logs in 5 minutes, because the logs will not represent the weather of the whole week. – ChemiCalChems Feb 19 '18 at 01:10
  • I just had an idea with my analogy. Weather is difficult to predict, but sample values aren't. I could recreate the whole week of weather went I'm asked for the assignment instead of regularly taking notes. It should take the same amount of time and I wouldn't have to have a clock running at all. Looking in the past is easier that looking in the present. – ChemiCalChems Feb 19 '18 at 01:16
  • @ChemiCalChems Technically you can easily produce samples when asked for them. Grab the most recent copy of the global state (e.g. a set of currently playing MIDI instruments with pitches + velocities), and use that to generate the asked count of samples, assuming they all are currently playing. – Soonts Feb 19 '18 at 01:24
  • Practically that’s very far from easy. C++ doesn’t have co-routines. To avoid artefacts you’ll need to maintain phase information for each voice, implement proper fade in/out transition for each voice. Even mixing voices together is surprisingly hard if you want high quality result and don’t want to decrease output volume proportionally. – Soonts Feb 19 '18 at 01:26
  • I'm currently applying this idea. I'm getting some mild artifacts, but latency seems to be gone for the most part. – ChemiCalChems Feb 19 '18 at 01:28
  • 1
    Latency issues are completely gone, just housekeeping from now on. I knew a second opinion would be enough. Thank you so much for this, I've been pulling hairs out of my head for far too long, now at last I can play some music without intense latency issues. Accepting answer. – ChemiCalChems Feb 19 '18 at 01:32