Theoretically, on modern CPUs which is faster:
- receiving NOT result from table
- or calculating it by
~
(in C) operation?
Presuming that all the table fits in L1 cache.
Bitwise not:
uint8_t bitwise_not(uint8_t arg) { return ~arg; }
Table not:
// precalculcating table (once)
uint8_t table[0x100];
for (int i = 0; i < 0x100; ++i) { table[i] = ~static_cast<uint8_t>(i); }
// function
uint8_t table_not(uint8_t arg) { return table[arg]; }
// xor_not:
uint8_t xor_not(uint8_t arg) { return arg ^ 0xff; }
On not a single operation, but several billions operations, is reading from L1 cache faster than any logical operation or not? (I think L1 is faster, but cannot prove it.)
Practically, how to measure it?