I am currently writing a C++ real time audio application which roughly contains:
- reading frames from a buffer
- interpolating frames with the hermit interpolation here
- filtering ever frame with two biquad filters (and updating their coefficients every frame)
- a 3 band crossover containing 18 biquad calculations
- a FreeVerb algorithm from the STK libary here
I think this should be handable for my PC but I get some buffer underflows every so often so I would like to improve the performance of my application. I have a bunch of question I hope you can answer me. :)
1) Operator Overloading
Instead of working directly with my flaot samples and doing calculations for every sample,
I pack my floats in a Frame
class which contains the left and the right Sample. The class overloads some operators for addition, subtraction and multiplication with float
.
The filters (biquad mostly) and the reverb works with floats and doesn't use this class but the hermite interpolator and every multiplication and addition for volume controll and mixing uses the class.
Does this has an impact on the performance and would it be better to work with left and right sample directly?
2) std::function
The callback function from the audio IO libary PortAudio calls a std::function. I use this to encapsulation everything related to PortAudio. So the "user" sets his own callback function with std::bind
std::bind( &AudioController::processAudio,
&(*this),
std::placeholders::_1,
std::placeholders::_2));
Since for every callback, the right function has to be found from the CPU (however this works...), does this have an impact and would it be better to define a class the user has to inherit from?
3) virtual functions
I use a class called AudioProcessor
which declares a virtual function:
virtual void tick(Frame *buffer, int frameCout) = 0;
This function always processes a number of frames at once. Depending on the drive, 200 frames up to 1000 frames per call. Within the signal processing path, I call this function 6 time from multiple derivated classes. I remember that this is done with lookup tables so the CPU knows exactly which function it has to call. So does the process of calling a "virtual" (derivated) function has an impact on the performance?
The nice thing about this is the structure in the source code but only using inlines maybe would have an performance improvement.
These are all questions for now. I have some more about Qt's event loop because I think that my GUI uses quite a bit of CPU time as well. But this is another topic I guess. :)
Thanks in advance!
These are all relevant function calls within the signal processing. Some of them are from the STK libary. The biquad functions are from STK and should perform fine. This goes for the freeverb algorithm as well.
// ################################ AudioController Function ############################
void AudioController::processAudio(int frameCount, float *output) {
// CALCULATE LEFT TRACK
Frame * leftFrameBuffer = (Frame*) output;
if(leftLoaded) { // the left processor is loaded
leftProcessor->tick(leftFrameBuffer, frameCount); //(TrackProcessor::tick()
} else {
for(int i = 0; i < frameCount; i++) {
leftFrameBuffer[i].leftSample = 0.0f;
leftFrameBuffer[i].rightSample = 0.0f;
}
}
// CALCULATE RIGHT TRACk
if(rightLoaded) { // the right processor is loaded
// the rightFrameBuffer is allocated once and ensured to have enough space for frameCount Frames
rightProcessor->tick(rightFrameBuffer, frameCount); //(TrackProcessor::tick()
} else {
for(int i = 0; i < frameCount; i++) {
rightFrameBuffer[i].leftSample = 0.0f;
rightFrameBuffer[i].rightSample = 0.0f;
}
}
// MIX
for(int i = 0; i < frameCount; i++ ) {
leftFrameBuffer[i] = volume * (leftRightMix * leftFrameBuffer[i] + (1.0 - leftRightMix) * rightFrameBuffer[i]);
}
}
// ################################ AudioController Function ############################
void TrackProcessor::tick(Frame *frames, int frameNum) {
if(bufferLoaded && playback) {
for(int i = 0; i < frameNum; i++) {
// read from buffer
frames[i] = bufferPlayer->tick();
// filter coeffs
caltulateFilterCoeffs(lowCutoffFilter->tick(), highCutoffFilter->tick());
// filter
frames[i].leftSample = lpFilterL->tick(hpFilterL->tick(frames[i].leftSample));
frames[i].rightSample = lpFilterR->tick(hpFilterR->tick(frames[i].rightSample));
}
} else {
for(int i = 0; i < frameNum; i++) {
frames[i] = Frame(0,0);
}
}
// Effect 1, Equalizer
if(effsActive[0]) {
insEffProcessors[0]->tick(frames, frameNum);
}
// Effect 2, Reverb
if(effsActive[1]) {
insEffProcessors[1]->tick(frames, frameNum);
}
// Volume
for(int i = 0; i < frameNum; i++) {
frames[i].leftSample *= volume;
frames[i].rightSample *= volume;
}
}
// ################################ Equalizer ############################
void EqualizerProcessor::tick(Frame *frames, int frameNum) {
if(active) {
Frame lowCross;
Frame highCross;
for(int f = 0; f < frameNum; f++) {
lowAmp = lowAmpFilter->tick();
midAmp = midAmpFilter->tick();
highAmp = highAmpFilter->tick();
lowCross = highLPF->tick(frames[f]);
highCross = highHPF->tick(frames[f]);
frames[f] = lowAmp * lowLPF->tick(lowCross)
+ midAmp * lowHPF->tick(lowCross)
+ highAmp * lowAPF->tick(highCross);
}
}
}
// ################################ Reverb ############################
// This function just calls the stk::FreeVerb tick function for every frame
// The FreeVerb implementation can't realy be optimised so I will take it as it is.
void ReverbProcessor::tick(Frame *frames, int frameNum) {
if(active) {
for(int i = 0; i < frameNum; i++) {
frames[i].leftSample = reverb->tick(frames[i].leftSample, frames[i].rightSample);
frames[i].rightSample = reverb->lastOut(1);
}
}
}
// ################################ Buffer Playback (BufferPlayer) ############################
Frame BufferPlayer::tick() {
// adjust read position based on loop status
if(inLoop) {
while(readPos > loopEndPos) {
readPos = loopStartPos + (readPos - loopEndPos);
}
}
int x1 = readPos;
float t = readPos - x1;
Frame f = interpolate(buffer->frameAt(x1-1),
buffer->frameAt(x1),
buffer->frameAt(x1+1),
buffer->frameAt(x1+2),
t);
readPos += stepSize;;
return f;
}
// interpolation:
Frame BufferPlayer::interpolate(Frame x0, Frame x1, Frame x2, Frame x3, float t) {
Frame c0 = x1;
Frame c1 = 0.5f * (x2 - x0);
Frame c2 = x0 - (2.5f * x1) + (2.0f * x2) - (0.5f * x3);
Frame c3 = (0.5f * (x3 - x0)) + (1.5f * (x1 - x2));
return (((((c3 * t) + c2) * t) + c1) * t) + c0;
}
inline Frame BufferPlayer::frameAt(int pos) {
if(pos < 0) {
pos = 0;
} else if (pos >= frames) {
pos = frames -1;
}
// get chunk and relative Sample
int chunk = pos/ChunkSize;
int chunkSample = pos%ChunkSize;
return Frame(leftChunks[chunk][chunkSample], rightChunks[chunk][chunkSample]);
}