I'm using a realtime reverb in a game I'm developing, and the algorithm uses simple comb filters in its implementation (basically delay lines that gradually decay to zero). I've been noticing something strange: the reverb usually processes a block of audio in 0.5 ms, and the algorithm is O(n), so shouldn't show much variation. But if the input audio is silent for a few seconds, the CPU time suddenly swells up to 4-5 ms.
I traced it down to floating point multiplication - the reverb algorithm gradually fades the comb filter 'echoes' by multiplying by a coefficient, like say 0.6. So the samples decay to 60% each time through the delay loop.
But as the decaying values get really small, like 1e-40, those multiplies suddenly slow way down.
Here's a distilled C++ example with timings:
float value = 1.0f;
for (int run = 0; run < 14; run++) {
auto timer = __rdtsc();
for (int i = 0; i < 1000; i++) {
value *= 0.99f;
}
auto elapsed = __rdtsc() - timer;
cout << value << ": " << elapsed << " cycles \n";
}
__rdtsc() is a timer intrinsic, returning a CPU cycle counter. Here's the results of the run on an optimized build:
4.31717e-005: 4045 cycles
1.86379e-009: 3953 cycles
8.0463e-014: 3957 cycles
3.47372e-018: 3953 cycles
1.49966e-022: 4739 cycles
6.47429e-027: 3950 cycles
2.79506e-031: 3957 cycles
1.20668e-035: 3951 cycles
5.20941e-040: 39229 cycles
7.00649e-044: 126274 cycles
7.00649e-044: 122688 cycles
7.00649e-044: 113354 cycles
7.00649e-044: 113350 cycles
7.00649e-044: 116490 cycles
So when the exponent gets around -40 things go south quickly; by the time the precision bottoms out at -44 we're going about 28x slower.
For reference this is on an Intel chip; I checked the assembler output and the multiplies are being done using the MMX 'MULSS' scalar multiply instruction on the XMM0 register.
So the question is basically, what's going on? Is it something specific to Intel or MMX, or a more general problem with floating point math? I can work around it easily enough by clamping small values to 0, but curious what the culprit is, 28x slowdown is pretty nasty and this kind of math isn't uncommon.