Windows x86 assembly language syntax

Question

(1) What does the following code mean? I cannot find any reference about the ds:[ ] syntax anywhere online. How is it different from without the ds:?

cmp eax,dword ptr ds:[12B656Ch]

(2) In the following instruction,

movsx eax,word ptr [esi+24h]

What is the esi register used for? Is it possible to guess what the original C code is doing from using such a rare register?

*How is it different from without the ds:?*. It isn't, your disassembler is just being verbose. — Peter Cordes, Aug 26 '16 at 20:45
*What is the esi register used for?* The base address in your addressing mode. You could figure out what the original C is doing if you looked at the whole function, not just one instruction. — Peter Cordes, Aug 26 '16 at 20:46
"Is it possible to guess what the original C code is doing from using such a rare register?" I wouldn't exactly define `esi` such a rare register... — Matteo Italia, Aug 26 '16 at 20:49
(1) Thank you for your answer. So I can just look at the address in the debugger as if it is without the ds? — cr001, Aug 26 '16 at 20:49
(1) `ds` makes explicit that the value is in the data segment, so you don't get confused about looking for the value in the stack or any other place. (2) `esi` is being used to point to a variable, probably an array (or a string), extracting the word pointed by `esi` 24h bytes ahead. — Jose Manuel Abarca Rodríguez, Aug 26 '16 at 20:49
(2) I see that it is not really a register for special use. I wanted to see if there is a quick way to guess the original C code if it is a special register but now it seems there isn't. — cr001, Aug 26 '16 at 20:51
@cr001: (1) yes; unless there's a specific segment override, by default addresses in `mov` and most other instructions are relative to the DS segment (notable exceptions: string instructions - which use ES; code addresses, which are fetched relative to CS; but given that CS, DS and ES are set by default to zero in "normal" 32 bit processes, these are details that can be normally ignored). — Matteo Italia, Aug 26 '16 at 20:53
To Jose Rodriguez: Thank you very much. That is the sort of things I wanted to know. I will try to see if the data at that address matches any noticeable array or string data in the program. — cr001, Aug 26 '16 at 20:53
@cr001: (2) ESI is "special" in the fact that can be used in "string instructions" (look for `rep`, `lodsb`, `stosb` and similar instructions), but is generally used also as a normal, general purpose register. It is one of the registers preserved across function calls, so it may be used when the value has to "survive" several function calls without spilling it on the stack (so, it may be used for a long-lived variable in the current function). — Matteo Italia, Aug 26 '16 at 20:55
@PeterCordes It's actually required in MASM syntax. Without the `ds:` it's treated as an immediate, so `cmp eax,dword ptr [12B656Ch]` is the same as `cmp eax, 12B656Ch`. — Ross Ridge, Aug 26 '16 at 21:46
@RossRidge: holy crap, that's a horrible syntax design decision! Thanks for the heads-up. — Peter Cordes, Aug 26 '16 at 21:48
*That is the sort of things I wanted to know.*. **I think http://stackoverflow.com/questions/38843403/which-segment-register-is-used-by-default would have been a better duplicate target, then**. I'll reopen, but someone else will have to re-close. Apparently I can't re-close it. — Peter Cordes, Aug 26 '16 at 21:51
Also relevant: [What are the different segment registers (SS/CS/DS/ES/FS/GS) intended for](http://stackoverflow.com/questions/10810203/what-is-the-fs-gs-register-intended-for) — Peter Cordes, Aug 26 '16 at 22:28
`word ptr [esi+24h]` is also a common pattern for accessing fields in a struct or class, where `esi` points to the object, and the field has an offset of 24h. — IInspectable, Aug 27 '16 at 07:14

Johan · Answer 1 · 2016-08-28T18:52:30.827

DS refers to the Data Segment.
In Win32, CS = DS = ES = SS = 0.
That is these segments do not matter and a flat 32 bit address space is used.

The Data segment is the default segment when accessing memory. Some disassemblers mistakenly list it, even though it serves no purpose to list a default segment.
You can list a different segment if you do wish by using a segment override.
CS is de Code Segment which is the default segment for jumps and calls and SS is the Stack segment which is the default for addresses based on ESP.
ES is the Extra Segment which is used for string instructions.

The only segment override that makes sense in Win32 is FS (The F does not stand for anything, but it comes after E).
FS links to the Thread Information Block (TIB) which houses thread specific data and is very useful for Thread Local Storage and multi-threading in general.
There is also a GS which is reserved for future use in Win32 and is used for the TIB in Win64.
In Linux the picture is more or less the same.

What is register X for
You must let go of the notion that registers have special purposes.
In x86 you can use almost any register for almost any purpose.
Only a few complex instructions use specific registers, but the normal instructions can use any register.
The compiler will try and use as many registers as possible to avoid having to use memory.

Having said this the original purposes of the 8 x86 registers are as follows:

EAX : accumulator, some instructions using this register have 'short versions'.  
EDX : overflow for EAX, used to store 64 bit values when multiplying or dividing.
ECX : counter, used in string instructions like rep mov and shifts.
EBX : miscellaneous general purpose register.
ESI : Source Index register, used as source pointer for string instructions
EDI : Destination Index register, used as destination pointer
ESP : Stack pointer, used to keep track of the stack
EBP : Base pointer, used in stack frames

You can use any register pretty much as you please, with the exception of ESP. Although ESP will work in many instructions, it is just too awkward to lose track of the stack.

Is it possible to guess what the original C code is doing from using such a rare register?

My guess:

struct x {
  int a,b,c,d,e,f,g,h,i,j;    //36 bytes
  short s };
....
int i = x.s;

ESI likely points to some structure or object. At offset 24h (36) a short is present which is transfered into an int. (hence the mov with Sign eXtend).
ESI does not link local variable, because in that case EBP or ESP would be used.
If you want to know more about the c code you'd need more context.
Many c constructs translate into multiple cpu instructions.

The best way to see this is to write c code and inspect the cpu code that gets generated.

This question should have been closed as a duplicate of http://stackoverflow.com/questions/38843403/which-segment-register-is-used-by-default. I probably should have left it closed with the old dup-target, instead of re-opening and not being able to close it. — Peter Cordes, Aug 27 '16 at 21:03

Windows x86 assembly language syntax

1 Answers1