As people said in comments:
if(SQRT[a*a+b*b]>something)
is a horrible example use-case. If that's all you need SQRT for, just square something
.
As long as you can tell the compiler that SQRT
doesn't alias anything, then a run-time loop will make your executable smaller, and only add a tiny amount of CPU overhead during startup. Definitely use uint8_t
, not int
. Loading a 32bit temporary from an 8bit memory location is no slower than from a zero-padded 32b memory location. (The extra movsx
instruction on x86, instead of using a memory operand, will more than pay for itself in reduced cache pollution. RISC machines typically don't allow memory operands anyway, so you always need an instruction to load the value into a register.)
Also, sqrt
is 10-21 cycle latency on Sandybridge. If you don't need it often, the int->double, sqrt, double->int chain is not much worse than an L2 cache hit. And better than going to L3 or main memory. If you need a lot of sqrt
, then sure, make a LUT. The throughput will be much better, even if you're bouncing around in the table and causing L1 misses.
You could optimize the initialization by squaring instead of sqrting, with something like
uint8_t sqrt_lookup[65536];
void init_sqrt (void)
{
int idx = 0;
for (int i=0 ; i < 256 ; i++) {
// TODO: check that there isn't an off-by-one here
int iplus1_sqr = (i+1)*(i+1);
memset(sqrt_lookup+idx, i, iplus1_sqr-idx);
idx = iplus1_sqr;
}
}
You can still get the benefits of having sqrt_lookup
being const
(compiler knows it can't alias). Either use restrict
, or lie to the compiler, so users of the table see a const
array, but you actually write to it.
This might involve lying to the compiler, by having it declared extern const
in most places, but not in the file that initializes it. You'd have to make sure this actually works, and doesn't create code referring to two different symbols. If you just cast away the const
in the function that initializes it, you might get a segfault if the compiler placed it in rodata
(or read-only bss
memory if it's uninitialized, if that's possible on some platforms?)
Maybe we can avoid lying to the compiler, with:
uint8_t restrict private_sqrt_table[65536]; // not sure about this use of restrict, maybe that will do it?
const uint8_t *const sqrt_lookup = private_sqrt_table;
Actually, that's just a const
pointer to const
data, not a guarantee that what it's pointing to can't be changed by another reference.