Does
if (k==5) n=k;
compile to the equivalent of
if (k==5) n=5;
in Visual Studio C++?
Does
if (k==5) n=k;
compile to the equivalent of
if (k==5) n=5;
in Visual Studio C++?
As stated above in the comments what happens depends on K, whether local, global, visible, const etc, but also on your compile optimisation settings. For msvc You could disable optimisation and then look at the generated code and how it differs
Not necessarily. There are a lot of things to consider when optimizing assembly. To name a few you have to know how many bytes the binary takes up, branching, instruction execution time and throughput.
From a simple test of
int test(int k) {
int n = 0;
if(k == 5) {
n = k;
}
return n;
}
we get this assembly
int test(int):
xor eax, eax ; int n = 0; Zero out eax, the return register
cmp ecx, 5 ; if(k == 5); Compare edi (the register taking in the input) to 5
cmovne ecx, eax ; k = 0; If ecx (k) was not equal to 5, zero it out
mov eax, ecx ; n = k; Set eax (the return value) to ecx (k)
ret ; return n;
As we can see this is not moving literal 5 into the n
. In fact it's conditionally moving 0 into k
, and then moving k
into n
.
To understand why we need to look at another piece of assembly
xor eax, eax ; Zero out eax, the return register.
cmp ecx, 5 ; Compare ecx ('k') to 5
jne Return ; If k is not equal to five, jump to return
mov eax, 5 ; Move 5 into the return register
Return:
ret ; Return
This is the same number of instructions, but the compiler can see some glaring problems. The first is the conditional jump, which can suffer from a branch prediction failure. The next is that while this is the same number of instructions, it takes more bytes to represent those instructions.
The two differing instructions that MSVC produces are
0f 45 c8 cmovne ecx, eax
89 c8 mov eax, ecx
whereas the jumping version produces
75 05 jne Return
b8 05 00 00 00 mov eax, 5
As we can see MSVC's version requires 2 fewer bytes to represent. The version you describe is bad for the instruction cache. Getting into the details of cache lines, memory loads, instruction decoding, and instruction scheduling would take hours. But it boils down to this: keep data small, use as little of a cache line as possible for a single entry, pack as much other data entries as you can into the cache lines. You might wonder why moving 5 into eax takes so many more bytes to represent, the answer is that it is moving a DWORD (4 byte integer) into the register, and to move a DWORD, you need 4 bytes. Whereas with the moving between registers, there are only a handful of registers accessible in your CPU, and thus they can be represented in a single byte.
There are a lot of other optimizations we can and cannot see here. For instance, zeroing data is not done with a mov, but an xor. This is because xor'ing something against eax is such a common task that it only takes one byte to represent the mnemonic, plus one more byte for the other register. Additionally, this xor'ing practice is so common that it is almost guaranteed that your CPU is decoding it into a very simple move, again going over instruction decoding and scheduling would take hours.
Some other things that were optimized away were pushing a stack frame, and using the stack for mutable storage.
In the end, you shouldn't worry about things like this. There are times where micro-optimization is important, but usually the compiler knows better than you. They've read the C++ spec, they've read the x86 software developers guide, and they work directly with CPU manufactures to improve their compilers.
I highly recommend Matt Godbolt's (the guy who made the compiler explorer that let's me easily view the generated assembly) talk What has my compiler done for me lately if you are interested in learning some of the nuances a compiler considers when optimizing code.