Denormalized floating point numbers: which operations trigger expensive special cases?

Question

Denormalized floating point numbers require expensive special handling in some operations (additions, multiplications). While this is well-known, it seems to me that there are also many comparably simple operations might not be affected by such a penalty. I haven't been able to find a good overview of what things are "safe" on different platforms ans was wondering if others here know more. I am especially interested in the answer for x86-64 and CUDA/PTX, for the following classes of operations.

Floating point comparison
Absolute value
Rounding operations (ceil, floor, trunc, round)
Conversion (single ↔ double, float ↔ integer)

It's not the same on all x86-64 CPUs; different microarchitectures are different. (I summarized some in [Performance penalty: denormalized numbers versus branch mis-predictions](https://stackoverflow.com/q/60969892)). x86-64 does have FTZ / DAZ to disable gradual underflow and get 0.0 instead of subnormals, avoiding all perf penalties on all microarchitectures. Absolute value is done with bitwise AND to clear the sign bit on x86-64 (SSE2 fp math) so the FPU proper isn't even involved. Conversion to integer is always fast, even on older CPUs like Core2 that have penalties in more other cases — Peter Cordes, Jul 05 '20 at 14:40
Thank you for the comment and link! For x86-64, I would be interested in the behavior of recent uarchs (>= Skylake), and assuming that FTZ/DAZ are not used. — Wenzel Jakob, Jul 05 '20 at 22:15
@talonmies: This document doesn't really answer the question I had asked, which are performance implications for specific operations. — Wenzel Jakob, Jul 07 '20 at 15:08

Denormalized floating point numbers: which operations trigger expensive special cases?

0 Answers0