Why does reading EOF make scanf return 4294967295?

Question

In my assembly program I want to test that standard input reached its EOF

segment .data
    .fmt_read db "%80s", 0 ; 79 bytes of actual string + terminating 0

segment .text
    lea rdi, [.fmt_read]
    lea rsi, [buf_str]     ; buffer to fill in
    xor eax, eax           ; no floating-point parameters are passed
    call scanf
    cmp rax, -1            ; did we reach EOF(-1)
    je .done               ; yes? End the program

When I debug it in gdb I press Ctrl-D to make the scanf recognise EOF. Then test return value in rax, hoping to find an EOF indicator(-1).

(gdb) p $rax
$5 = 4294967295
(gdb) p/x $rax
$6 = 0xffffffff

I understand that it is the value -1 in binary two's complement. Although, I did not understand why cmp rax, -1 did not set ZF(as they are equal).

How to determine EOF?

Zapping EAX on entry and testing RAX on return - intentional? — Tom Goodfellow, Sep 14 '16 at 06:27
@TomGoodfellow: RAX as an input is totally separate from RAX as an output. On input, the x86-64 SysV ABI requires AL=number of FP args in XMM regs for variadic functions. Also, perhaps you weren't aware of that `xor eax,eax` does zero RAX: http://stackoverflow.com/questions/11177137/why-do-most-x64-instructions-zero-the-upper-part-of-a-32-bit-register. — Peter Cordes, Sep 14 '16 at 15:00

score 3 · Accepted Answer · answered Sep 14 '16 at 06:32

3

There's no CMP r/m64, imm64 (or CMP RAX, imm64). There's CMP RAX, imm32 which sign-extends the immediate operand to 64 bits, i.e. -1 (0xffffffff) will be sign-extended to 0xffffffffffffffff.

If you want to compare RAX to 0xffffffff you can use something like:

mov ebx, -1
cmp rax, rbx

Or you could simply use EAX in the comparison rather than RAX:

cmp eax, -1

answered Sep 14 '16 at 06:32

Michael

57,169
9
80
125

but why scanf returned 0xffffffff(32 bit) in rax and not 0xffffffffffffffff(64 bit) on amd64 machine? I read man on Linux 64 bit machine and it does not mention limit of return value or its size. Do you have any clues? – Bulat M. Sep 14 '16 at 06:40
1

@BulatM. : because [scanf](http://man7.org/linux/man-pages/man3/scanf.3.html) returns a 32-bit `int` not a 64-bit `long` on 64-bit Linux. So you need to rely on the data in the lower 32-bits of _RAX_ . – Michael Petch Sep 14 '16 at 06:48
2

@BulatM.: That would depend on the size of `int`, and exactly how `EOF` is defined. There's a function named `feof` that returns non-zero if EOF has been reached for a given file. It might be a better idea to use that rather than checking for EOF directly. – Michael Sep 14 '16 at 06:50
@Michael: Now I understand, one little question: where did you find that scanf returns 32-bit int? I searched in several man pages and did find no mention of it. Maybe it is mentioned in POSIX specifications? And why depends not on long(on 64-bit machines, that would be logical) but on int? – Bulat M. Sep 14 '16 at 06:53
4

@BulatM. It is platform dependent as to what `int` is. But 64-bit Linux defines `int` as a 32-bit value. You can see a chart like the one [here](http://www.makelinux.net/ldd3/chp-11-sect-1) – Michael Petch Sep 14 '16 at 06:57
@BulatM.: Why would you ever want scanf to be able to convert more than 2^31-1 args in one call? Using a less efficient wider integer (`long`) makes no sense. In any case, ISO C (not POSIX) defines scanf as returning an `int`, not a `long`. Both can be as small as 16-bit. If 64-bit registers were faster or led to smaller code-size, x86-64 ABIs might have chosen to make `int` a 64-bit type. In general, they're the same speed, except that 64-bit takes larger machine code (a REX prefix), so 32-bit ops are the most efficient. – Peter Cordes Sep 14 '16 at 15:07
`If you want to compare RAX to 0xffffffff` but don't do that, because the ABI doesn't require 32-bit return values to be zero or sign extended to 64-bits. The upper bits of RAX should be assumed to contain random garbage, for any width or return value less than 64 bits. (function arg-passing is similar, but [narrow args are sign or zero extended to 32-bit](http://stackoverflow.com/questions/36706721/is-a-sign-or-zero-extension-required-when-adding-a-32bit-offset-to-a-pointer-for/36760539#36760539) in the x86-64 SysV ABI) – Peter Cordes Sep 14 '16 at 15:13
@Peter, and what is the correct way to do that(compare lower rax part to -1)? Should use `cmp eax, -1`? – Bulat M. Sep 15 '16 at 19:02
1

@BulatM.: yes, of course. Just look at C compiler output to see what happens when you do the equivalent thing in C. Write a quick function on http://gcc.godbolt.org/ or something. – Peter Cordes Sep 15 '16 at 19:04

Why does reading EOF make scanf return 4294967295?

1 Answers1