Considering the following code:
#include <stdio.h>
int main()
{
char A = A ? 0[&A] & !A : A^A;
putchar(A);
}
I'd like to ask, whether any undefined behaviour is observed in it or not.
Edit
Please note: the code intentionally uses 0[&A] & !A
and NOT A & !A
(see response below)
End edit
Taking the output ASM from g++ 6.3 (https://godbolt.org/g/4db6uO) we get (no optimizations were used):
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov BYTE PTR [rbp-1], 0
movzx eax, BYTE PTR [rbp-1]
movsx eax, al
mov edi, eax
call putchar
mov eax, 0
leave
ret
However clang gives a lot more code for the same thing (no optimizations again):
main: # @main
push rbp
mov rbp, rsp
sub rsp, 16
mov dword ptr [rbp - 4], 0
cmp byte ptr [rbp - 5], 0
je .LBB0_2
movsx eax, byte ptr [rbp - 5]
cmp byte ptr [rbp - 5], 0
setne cl
xor cl, -1
and cl, 1
movzx edx, cl
and eax, edx
mov dword ptr [rbp - 12], eax # 4-byte Spill
jmp .LBB0_3
.LBB0_2:
movsx eax, byte ptr [rbp - 5]
movsx ecx, byte ptr [rbp - 5]
xor eax, ecx
mov dword ptr [rbp - 12], eax # 4-byte Spill
.LBB0_3:
mov eax, dword ptr [rbp - 12] # 4-byte Reload
mov cl, al
mov byte ptr [rbp - 5], cl
movsx edi, byte ptr [rbp - 5]
call putchar
mov edi, dword ptr [rbp - 4]
mov dword ptr [rbp - 16], eax # 4-byte Spill
mov eax, edi
add rsp, 16
pop rbp
ret
And Microsoft VC compiler gives:
EXTRN _putchar:PROC
tv76 = -12 ; size = 4
tv69 = -8 ; size = 4
_A$ = -1 ; size = 1
_main PROC
push ebp
mov ebp, esp
sub esp, 12 ; 0000000cH
movsx eax, BYTE PTR _A$[ebp]
test eax, eax
je SHORT $LN5@main
movsx ecx, BYTE PTR _A$[ebp]
test ecx, ecx
jne SHORT $LN3@main
mov DWORD PTR tv69[ebp], 1
jmp SHORT $LN4@main
$LN3@main:
mov DWORD PTR tv69[ebp], 0
$LN4@main:
mov edx, 1
imul eax, edx, 0
movsx ecx, BYTE PTR _A$[ebp+eax]
and ecx, DWORD PTR tv69[ebp]
mov DWORD PTR tv76[ebp], ecx
jmp SHORT $LN6@main
$LN5@main:
movsx edx, BYTE PTR _A$[ebp]
movsx eax, BYTE PTR _A$[ebp]
xor edx, eax
mov DWORD PTR tv76[ebp], edx
$LN6@main:
mov cl, BYTE PTR tv76[ebp]
mov BYTE PTR _A$[ebp], cl
movsx edx, BYTE PTR _A$[ebp]
push edx
call _putchar
add esp, 4
xor eax, eax
mov esp, ebp
pop ebp
ret 0
_main ENDP
But with optimizations we get so more cleaner code (gcc and clang):
main: # @main
push rax
mov rsi, qword ptr [rip + stdout]
xor edi, edi
call _IO_putc
xor eax, eax
pop rcx
ret
And a sort of mysterious VC code (seems the VC compiler can't understand a joke ... and it just does not precalculate the right hand side).
EXTRN _putchar:PROC
_A$ = -1 ; size = 1
_main PROC ; COMDAT
push ecx
mov cl, BYTE PTR _A$[esp+4]
test cl, cl
je SHORT $LN3@main
mov al, cl
xor al, 1
and cl, al
jmp SHORT $LN4@main
$LN3@main:
xor cl, cl
$LN4@main:
movsx eax, cl
push eax
call _putchar
xor eax, eax
pop ecx
pop ecx
ret 0
_main ENDP
Some Warnings:
- You should not write code like this. This is definitely bad coding style and never should go into a serious application. Just for fun.
Some Explanations:
- I look for undefined behaviour since the value of
A
is used in its initialization. Again: You should not do this. - However the way the expression is built up, both parts of the code will yield 0, as the compilers
So I am in this dilemma right now whether is this UB or not UB.