Here is an example piece of code:
#include <stdint.h>
#include <iostream>
typedef struct {
uint16_t low;
uint16_t high;
} __attribute__((packed)) A;
typedef uint32_t B;
int main() {
//simply to make the answer unknowable at compile time
uint16_t input;
cin >> input;
A a = {15,input};
B b = 0x000f0000 + input;
//a equals b
int resultA = a.low-a.high;
int resultB = b&0xffff - (b>>16)&0xffff;
//use the variables so the optimiser doesn't get rid of everything
return resultA+resultB;
}
Both resultA and resultB calculate the exact same thing - but which is faster (assuming you don't know the answer at compile time).
I tried using Compiler Explorer to look at the output, and I got something - but with any optimisation no matter what I tried it outsmarted me and optimised the whole calculation away (at first, it optimised everything away since it's not used) - I tried using cin to make the answer unknowable at runtime, but then I couldn't even figure out how it was getting the answer at all (I think it managed to still figure it out at compile time?)
Here is the output of Compiler Explorer with no optimisation flag:
push rbp
mov rbp, rsp
sub rsp, 32
mov dword ptr [rbp - 4], 0
movabs rdi, offset std::cin
lea rsi, [rbp - 6]
call std::basic_istream<char, std::char_traits<char> >::operator>>(unsigned short&)
mov word ptr [rbp - 16], 15
mov ax, word ptr [rbp - 6]
mov word ptr [rbp - 14], ax
movzx eax, word ptr [rbp - 6]
add eax, 983040
mov dword ptr [rbp - 20], eax
Begin calculating result A
movzx eax, word ptr [rbp - 16]
movzx ecx, word ptr [rbp - 14]
sub eax, ecx
mov dword ptr [rbp - 24], eax
End of calculation
Begin calculating result B
mov eax, dword ptr [rbp - 20]
mov edx, dword ptr [rbp - 20]
shr edx, 16
mov ecx, 65535
sub ecx, edx
and eax, ecx
and eax, 65535
mov dword ptr [rbp - 28], eax
End of calculation
mov eax, dword ptr [rbp - 24]
add eax, dword ptr [rbp - 28]
add rsp, 32
pop rbp
ret
I will also post the -O1 output, but I can't make any sense of it (I'm quite new to low level assembly stuff).
main: # @main
push rax
lea rsi, [rsp + 6]
mov edi, offset std::cin
call std::basic_istream<char, std::char_traits<char> >::operator>>(unsigned short&)
movzx ecx, word ptr [rsp + 6]
mov eax, ecx
and eax, -16
sub eax, ecx
add eax, 15
pop rcx
ret
Something to consider. While doing operations with the integer is slightly harder, simply accessing it as an integer easier compared to the struct (which you'd have to convert with bitshifts I think?). Does this make a difference?
This originally came up in the context of memory, where I saw someone map a memory address to a struct with a field for the low bits and the high bits. I thought this couldn't possibly be faster than simply using an integer of the right size and bitshifting if you need the low or high bits. In this specific situation - which is faster?
[Why did I add C to the tag list? While the example code I used is in C++, the concept of struct vs variable is very applicable to C too]