2

I am writing simple programs then analyze them. Today I've written this:

#include <stdio.h>
 
int x;
 
int main(void){
    printf("Enter X:\n");
 
    scanf("%d",&x);
 
    printf("You enter %d...\n",x);
 
    return 0;
}

It's compiled into this:

push    rbp
mov     rbp, rsp
lea     rdi, s          ; "Enter X:"
call    _puts
lea     rsi, x
lea     rdi, aD         ; "%d"
mov     eax, 0
call    ___isoc99_scanf
mov     eax, cs:x   <- don't understand this
mov     esi, eax
lea     rdi, format     ; "You enter %d...\n"
mov     eax, 0
call    _printf
mov     eax, 0
pop     rbp
retn

I don't understand what cs:x means.
I use Ubuntu x64, GCC 10.3.0, and IDA pro 7.6.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Inc.ace
  • 37
  • 5
  • 2
    Which disassembler did you use `cs:` is a segment prefix override but it's pretty useless in this context (also it's probably RO, IIRC). If you check the bytes of the instruction you can see if the prefix it's actually there (weird) or if the disassembler made it up (also weird). – Margaret Bloom Aug 08 '21 at 15:43
  • @MargaretBloom I use Ida pro 7.6 – Inc.ace Aug 08 '21 at 15:48
  • @MargaretBloom U was right, ghidra show me this -> MOV EAX,dword ptr [x] . I was thinking Ida pro was best interactive disas – Inc.ace Aug 08 '21 at 15:56
  • ah, ok. It's just the IDA way to tell you it's a RIP relative address. Despite using it regularly, I've never noticed it. I'm writing a short answer. – Margaret Bloom Aug 08 '21 at 15:56
  • 2
    Don't use pastebin when asking a question. Put your code inline, properly formatted. – arrowd Aug 08 '21 at 16:01
  • examine the machine code as well not just what some disassembler outputs, esp for x86 where it is difficult to impossible to reliably disassemble. – old_timer Aug 08 '21 at 17:25
  • @old_timer: Yes examine the machine code, but there's no reason to think anything went wrong when disassembling GCC output. You always post comments about the theoretical impossibility of decoding x86, but 99.9% of the time that's just FUD, and isn't a problem for the kinds of cases you're posting about. – Peter Cordes Aug 08 '21 at 20:18
  • from the machine code could just look it up in the intel docs rather than asking here – old_timer Aug 09 '21 at 01:37

1 Answers1

10

TL:DR: IDA confusingly uses cs: to indicate a RIP-relative addressing mode in 64-bit code.


In IDA mov eax, x means mov eax, DWORD [x] which in turn means reading a DWORD from the variable x.
For completeness, mov rax, OFFSET x means mov rax, x (i.e. putting the address of x in rax).

In 64-bit displacements are still 32-bit, so, for a Position Independent Executable, it's not always possible to address a variable by encoding its address (because it's 64-bit and it would not fit into a 32-bit field). And in position-independent code, it's not desirable.
Instead, RIP-relative addressing is used.

In NASM, RIP-relative addressing takes the form mov eax, [REL x], in gas it is mov x(%rip), %eax.
Also, in NASM, if DEFAULT REL is active, the instruction can be shortened to mov eax, [x] which is identical to the 32-bit syntax.

Each disassembler will disassemble a RIP-relative operand differently. As you commented, Ghidra gives mov eax, DWORD PTR [x].
IDA uses mov eax, cs:x to mean mov eax, [REL x]/mov x(%rip), %eax.

;IDA listing, 64-bit code
mov eax, x                ;This is mov eax, [x] in NASM and most likely wrong unless your exec is not PIE and always loaded <= 4GiB
mov eax, cs:x             ;This is mov eax, [REL x] in NASM and idiomatic to 64-bit programs

In short, you can mostly ignore the cs: because that's just the way variables are addressed in 64-bit mode.
Of course, as the listing above shows, the use or absence of RIP-relative addressing tells you the program can be loaded anywhere or just below the 4GiB.


The cs prefix shown by IDA threw me off.

I can see that it could mentally resemble "code" and thus the rip register but I don't think the RIP-relative addressing implies a cs segment override.

In 32-bit mode, the code segment is usually read-only, so an instruction like mov [cs:x], eax will fault.
In this scenario, putting a cs: in front of the operand would be wrong.

In 64-bit mode, segment overrides (other than fs/gs) are ignored (and the read-bit of the code segment is ignored anyway), so the presence of a cs: doesn't really matter because ds and cs are effectively indistinguishable. (Even an ss or ds override doesn't change the #GP or #SS exception for a non-canonical address.)
Probably the AGU doesn't even read the segment shadow registers anymore for segment bases other than fs or gs. (Although even in 32-bit mode, there's a lower latency fast path for the normal case of segment base = 0, so hardware may just let that do its job.)

Still cs: is misleading in my opinion - a 2E prefix byte is still possible in machine code as padding. Most tools still call it a CS prefix, although http://ref.x86asm.net/coder64.html calls it a "null prefix" in 64-bit mode. There's no such byte here, and cs: is not an obvious or clear way to imply RIP-relative addressing.

fuz
  • 88,405
  • 25
  • 200
  • 352
Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • 2
    IDA seems totally inconsistent, not using `cs:` in `lea rdi, format` which is also certainly using RIP-relative addressing. (This is GCC output, so if it was using 32-bit absolute it would have used `mov edi, OFFSET format`). It seems to be pretty much GNU `.intel_syntax`, although I guess it's actually the same syntax that it uses on Windows, and thus won't use `[RIP + format]` because it's more like actual MASM. Overall seems like bad choices for a disassembler that you'd want to use on obfuscated / hand-crafted machine code. – Peter Cordes Aug 08 '21 at 18:08
  • 1
    What would IDA show if your machine code actually did have a CS override prefix? – Peter Cordes Aug 08 '21 at 18:09
  • 2
    @PeterCordes A `db` instruction before the actual instruction. I too was curious and tried it. – Margaret Bloom Aug 08 '21 at 22:29