1

I have some C code that basically comes down to this:

    *p_Bool3 = *p_Bool1 || *p_Bool2;

with p_Bool1, p_Bool2 and p_Bool3 being "bool *".

clang++ gives me this assembly for X86 using O2 (O3 makes no difference, I'm just using O2 because thats whats used in the real project):

mov eax, dword ptr [esp + 12]
    mov edx, dword ptr [esp + 4]
    mov cl, 1
    cmp byte ptr [edx], 0
    je  LBB1_1
    mov byte ptr [eax], cl
    ret
LBB1_1: 
    mov ecx, dword ptr [esp + 8]
    mov cl, byte ptr [ecx]
    mov byte ptr [eax], cl
    ret

The actual program is a little more complex with a bunch ob get and set functions and some overloaded operators but it basically is the same assembly code using O2:

001de515  mov         al, 0x01 
001de517  mov         ecx, dword ptr ds:[edi+0x00000628] 
001de51d  cmp         byte ptr ds:[ecx], 0x00000000 
001de520  jne         0x001DE52E 
001de522  mov         eax, dword ptr ds:[edi+0x0000065C] 
001de528  cmp         byte ptr ds:[eax], 0x00000000 
001de52b  setne       al 
001de52e  mov         ecx, dword ptr ds:[edi+0x000005EC] 
001de534  mov         byte ptr ds:[ecx], al 

Now I have some reference code that runs 75% faster on the same hardware/system that looks like this:

0100121e  lea         ebx, dword ptr ds:[esi+0x10280005] 
01001224  mov         al, byte ptr ds:[ebx] 
01001226  lea         ebx, dword ptr ds:[esi+0x10280006] 
0100122c  or          al, byte ptr ds:[ebx] 
0100122e  lea         ebx, dword ptr ds:[esi+0x10280004] 
01001234  mov         byte ptr ds:[ebx], al 

How can I get clang to just use the 'or' instruction and be faster?

Peter
  • 33
  • 4

2 Answers2

4

Have you tried using an OR operator as in *p_Bool3 = *p_Bool1 | *p_Bool2? The compiler cannot generate this on its own since it is not permitted to dereference p_Bool2 if p_Bool1 is true.

If you cannot use a | operator here, it should also work to first manually dereference the pointers, telling the compiler that it is permitted to do so:

bool tmp1 = *p_Bool1, tmp2 = *p_Bool2;

*p_Bool3 = tmp1 || tmp2;
fuz
  • 88,405
  • 25
  • 200
  • 352
  • Is it necessary to set both ```tmp1``` and ```tmp2```? Would ```*p_Bool3 = *p_Bool1 || tmp2``` work or would it have the same risk of side effects as @Jérôme Richard described? – sj95126 Aug 04 '21 at 14:30
  • 1
    Wow I can't believe I haven't tried the '|' operator, I actually thought, I should use the '||' operator for bool, so it can be faster. I do like your second suggestion even better since I'm using the "right" operator and still get the fast result. The local variables get completely removed by the optimizer. – Peter Aug 04 '21 at 14:36
  • @sj95126 No `tmp1` would certainly work, but anyway using an `|` operator may be the better solution. Not only does it get rid of the jump, it also possibly avoids extra instructions to normalise the result. – fuz Aug 04 '21 at 15:07
  • @sj95126: See [Boolean values as 8 bit in compilers. Are operations on them inefficient?](https://stackoverflow.com/q/47243955) - compilers don't always take full advantage of the ABI guarantee that a `bool` is an integer 0 or 1, and will sometimes waste instructions re-booleanizing. – Peter Cordes Aug 04 '21 at 20:25
3

The left/right expressions used with the || operator are evaluated lazily. Since indirections can cause side effects (eg. page fault, hardware effects), the compiler need to use conditional to prevent side effects to occur when they should not.

To avoid this problem, you can use the | operator. Here is an example.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59