I'm thinking about really granular code optimization for a case where enumerated types are being stored in an array or hash table.
I've got an array of animal enums, where each enum has 4 different categories it can represent, Cat
, Dog
, Fish
, and Bird
(is there a name for the categories of an enumerated type, by the way?). I need to check if every value in a certain range is the same. Here's the unoptimized way of doing this:
func same([]Animal data, int start, int end) -> Animal or None:
if (end - start < 0 || end > data.end || start < data.start): // Bounds checks
return None
else if (start == end): // Trivial case
return data[min_index]
else: // Meat & potatoes
first_value = data[min_index]
relevant_data = data[min_index:max_index] // Slice of relevant parts
for (value in relevant_data): // Iterate over relevant data
if value != first_value: // Check if same
return None // Reject on first non-match
return first_value // Accept on base case
Now this is fine, it's O(n) time complexity for worst and average case, but it involves that pesky if
every time, which I think risks branch mispredictions at the compiler level. The more elegant way of doing this is to store data
differently, so that instead of it being implicitly stored as something like this:
Animal = Enum(
Cat // 0b00
Dog // 0b01
Fish // 0b10
Bird // 0b11
)
We can make data instead be stored as something like this:
Animal = SuperSpecialEnum(
Cat // 0b0001
Dog // 0b0010
Fish // 0b0100
Bird // 0b1000
)
Then, we can use this code instead:
func same([]Animal data, int start, int end) -> Animal or None:
if (end - start < 0 || end > data.end || start < data.start): // Bounds checks
return None
else if (start == end): // Trivial case
return data[min_index]
else: // Thanksgiving
first_value = data[min_index]
relevant_data = data[min_index:max_index] // Slice of relevant parts
for (value in relevant_data): // Iterate over relevant data
first_value &= value // Bitwise and check
if (first_value == 0b0000):
return None // Reject on non-match case
else:
return first_value // Accept on base case
Now, because of that bitwise first_value &= value
, we can avoid branching entirely. Sure, we give up on the early reject case, which would have saved us, well that's actually a bit hard to say, I think you'd need to consider the binomial distribution over your overall probability that any given value is different. But the benefit is that you've entirely eliminated branching, and many modern compilers and architectures support 128-bit and
operations, so this operation could be really, really fast. You're still O(n) time complexity for worst and average case, but you're potentially cutting down on the number of iterations by 16x (16-value and
's with 128-bit boolean arithmetic, assuming the compiler knows what it's doing, and it usually does), and completely eliminating the risk of branching mispredictions.
Now the real question is twofold, though both questions are really different applications of the same question of whether the compiler optimizes sub-byte values to use less space. One, would you be using 2x as much space to store data
because of the redefinition of enum to use one bit per value instead of log(values) bits, assuming constant enum category count (still interested in if there's a proper name for those categories by the way). Two, could you potentially get a 32x speedup if the compiler knows you're only using the first 4 bits of each Animal
thereby allowing for 32-value and
's with 128-bit boolean arithmetic?