Look at the following simple piece of code
int main()
{
short x = 0, y = 0;
scanf("%d", &x);
scanf("%d", &y);
printf("%d %d\n", x, y);
return 0;
}
If you input 4 and 5 to this program, you'd expect to get 4 and 5 in the output. With GCC 4.6.2 on windows (mingw), it produces 0 and 5 as the output. So I dug up a bit. This is the assembly code generated
movw $0, 30(%esp)
movw $0, 28(%esp)
leal 30(%esp), %eax
movl %eax, 4(%esp)
movl $LC0, (%esp)
call _scanf
leal 28(%esp), %eax
movl %eax, 4(%esp)
movl $LC0, (%esp)
call _scanf
While I haven't done much assembler coding, the above code does not look right. It seems to suggest that x is placed at an offset of 30 bytes of the esp, and y is placed at an offset of 28 bytes of the esp, and then their addresses are passed to scanf. So, when the addresses of x and y are dealt as long ints (4 byte addresses), the following should happen: The first call would set the bytes [30,34) to the value 0x00000004, and the second call would set the bytes [28, 32) to the value 0x00000005. However, since this is a little endian machine, we would have the [0x04 0x00 0x00 0x00] from 30 and then [0x05 0x00 0x00 0x00] from 28. This would cause byte number 30 to get reset to 0.
I tried reversing the order of the scanfs, and it worked (the output did come out as 4 and 5), so that now, the smaller offset was filled first, and then the latter (larger) offset.
It seemed preposterous that GCC could have messed this up. So I tried MSVC, and the assembly it generated had one marked difference. The variables were placed at offsets -4 and -8 (i.e. they were considered as 4 bytes long, though the comment said 2 bytes). Here's part of the code:
_TEXT SEGMENT
_x$ = -8 ; size = 2
_y$ = -4 ; size = 2
_main PROC
push ebp
mov ebp, esp
sub esp, 8
xor eax, eax
mov WORD PTR _x$[ebp], ax
xor ecx, ecx
mov WORD PTR _y$[ebp], cx
lea edx, DWORD PTR _x$[ebp]
push edx
push OFFSET $SG2470
call _scanf
add esp, 8
lea eax, DWORD PTR _y$[ebp]
push eax
push OFFSET $SG2471
call _scanf
add esp, 8
My question is in two parts:
- I don't have a personal Linux box at my disposal. Is this a GCC issue, or only a mingw issue?
But, more importantly,
- Is this a bug at all? How would a compiler figure out if it should place "short"s at 2-byte offsets or 4-byte offsets?