I have some C code that basically comes down to this:
*p_Bool3 = *p_Bool1 || *p_Bool2;
with p_Bool1, p_Bool2 and p_Bool3 being "bool *".
clang++ gives me this assembly for X86 using O2 (O3 makes no difference, I'm just using O2 because thats whats used in the real project):
mov eax, dword ptr [esp + 12]
mov edx, dword ptr [esp + 4]
mov cl, 1
cmp byte ptr [edx], 0
je LBB1_1
mov byte ptr [eax], cl
ret
LBB1_1:
mov ecx, dword ptr [esp + 8]
mov cl, byte ptr [ecx]
mov byte ptr [eax], cl
ret
The actual program is a little more complex with a bunch ob get and set functions and some overloaded operators but it basically is the same assembly code using O2:
001de515 mov al, 0x01
001de517 mov ecx, dword ptr ds:[edi+0x00000628]
001de51d cmp byte ptr ds:[ecx], 0x00000000
001de520 jne 0x001DE52E
001de522 mov eax, dword ptr ds:[edi+0x0000065C]
001de528 cmp byte ptr ds:[eax], 0x00000000
001de52b setne al
001de52e mov ecx, dword ptr ds:[edi+0x000005EC]
001de534 mov byte ptr ds:[ecx], al
Now I have some reference code that runs 75% faster on the same hardware/system that looks like this:
0100121e lea ebx, dword ptr ds:[esi+0x10280005]
01001224 mov al, byte ptr ds:[ebx]
01001226 lea ebx, dword ptr ds:[esi+0x10280006]
0100122c or al, byte ptr ds:[ebx]
0100122e lea ebx, dword ptr ds:[esi+0x10280004]
01001234 mov byte ptr ds:[ebx], al
How can I get clang to just use the 'or' instruction and be faster?