28x slowdown when multiplying small floating point numbers

Question

I'm using a realtime reverb in a game I'm developing, and the algorithm uses simple comb filters in its implementation (basically delay lines that gradually decay to zero). I've been noticing something strange: the reverb usually processes a block of audio in 0.5 ms, and the algorithm is O(n), so shouldn't show much variation. But if the input audio is silent for a few seconds, the CPU time suddenly swells up to 4-5 ms.

I traced it down to floating point multiplication - the reverb algorithm gradually fades the comb filter 'echoes' by multiplying by a coefficient, like say 0.6. So the samples decay to 60% each time through the delay loop.

But as the decaying values get really small, like 1e-40, those multiplies suddenly slow way down.

Here's a distilled C++ example with timings:

float value = 1.0f;

for (int run = 0; run < 14; run++) {
    auto timer = __rdtsc();
    for (int i = 0; i < 1000; i++) {
        value *= 0.99f;
    }
    auto elapsed = __rdtsc() - timer;
    cout << value << ": " << elapsed << " cycles \n";
}

__rdtsc() is a timer intrinsic, returning a CPU cycle counter. Here's the results of the run on an optimized build:

4.31717e-005: 4045 cycles
1.86379e-009: 3953 cycles
8.0463e-014: 3957 cycles
3.47372e-018: 3953 cycles
1.49966e-022: 4739 cycles
6.47429e-027: 3950 cycles
2.79506e-031: 3957 cycles
1.20668e-035: 3951 cycles
5.20941e-040: 39229 cycles
7.00649e-044: 126274 cycles
7.00649e-044: 122688 cycles
7.00649e-044: 113354 cycles
7.00649e-044: 113350 cycles
7.00649e-044: 116490 cycles

So when the exponent gets around -40 things go south quickly; by the time the precision bottoms out at -44 we're going about 28x slower.

For reference this is on an Intel chip; I checked the assembler output and the multiplies are being done using the MMX 'MULSS' scalar multiply instruction on the XMM0 register.

So the question is basically, what's going on? Is it something specific to Intel or MMX, or a more general problem with floating point math? I can work around it easily enough by clamping small values to 0, but curious what the culprit is, 28x slowdown is pretty nasty and this kind of math isn't uncommon.

Related: http://stackoverflow.com/questions/15140847/denormalized-numbers-ieee-754-floating-point — user253751, Jun 01 '16 at 03:44
@Yakk ... because it's not? How is it possibly a dupe? The other question is asking general questions about denormalized numbers, this one is asking why multiplying two small numbers is slow (where the answer includes "because they're denormalized"). — user253751, Jun 01 '16 at 04:00
@immibis "why are small numbers slow to multiply" is answered by that other question. As are "small numbers are denormalized". I am unaware of anything not covered by the other question's answers, other than maybe "oh, and this applies in your case", which is sort of implied by the "the answer is here" link. — Yakk - Adam Nevraumont, Jun 01 '16 at 04:15
The discussion on the denormalized answer definitely answers my question, but not sure the duplicate tag makes sense. I still think this one would help people who weren't already searching for "denormalized", but floating point multiplication in general. I'd be happy to accept a short answer that mentions it's due to numbers in the denormalized range, with a link (on that topic, is E-40 small enough to be denormalized? Drastic slowdown starts between E-35 and E-40) — QuadrupleA, Jun 01 '16 at 04:38
Denormalization starts when the exponent gets down to -127 (for single precision floats), which is 10 ^(-127 * 0.3) ~ 10^-37. (assuming log 2 ~ 0.3). So yes, E-35 to E-40 makes sense. — Rudy Velthuis, Jun 01 '16 at 06:35
From the point of view of helping others, a duplicate linked to a question with full answers is better than a non-duplicate with a specialized answer. — Patricia Shanahan, Jun 01 '16 at 10:20
@QuadrupleA Duplicate doesn't delete your question, on purpose. It remains searchable and on the website. It just says "don't bother answering this question, just look over here" to people writing answers, and transfers google-juice to the other question for searching. The point is to direct people towards a better answer than you are likely to get in a random post. — Yakk - Adam Nevraumont, Jun 02 '16 at 17:10

28x slowdown when multiplying small floating point numbers

0 Answers0