0

I am having some trouble understanding how this operation works in assembly. I have a program I wrote and disassembled. In it, I access individual characters of a string and compare it to a test string. The important part of the code that I am confused about is here:

mov     ecx, [ebp+string_offset]
add     ecx, [ebp+counter_offset]
movsx   edx, byte ptr [ecx]

This is generated via some compiler optimization I am guessing. The actual function of the code makes sense. ECX is given an address, the counter is added to it (to increment the location in memory) and then MOVSX loads the byte pointed to by ECX.

Where I am confused is the bracketing here. Normally brackets are dereferencing operations. So, if that was true ECX would contain the first actual element of the string. Then, we'd dereference the counter, and add it to ECX to get the element we want, and finally load that element into EDX for comparison.

This does not make any sense because if dereferencing a string's address is giving me back the string itself, then adding the counter would just produce non-sense. Indeed, during debugging ECX will contain the actual address of the character in the string and then EDX will contain the character itself.

Is this a non-standard use of brackets or something special with c-strings (Char* xyz = "test") as arrays that I am missing?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
CL40
  • 579
  • 1
  • 5
  • 12
  • 1
    It's dereferencing `ebp` to load your local variables from memory. `string_offset` and `counter_offset` both live in memory (on the stack), they need to be loaded. – Jester Jul 04 '21 at 22:40
  • Oh I see. So the variable itself, pointed to by `EBP+string_offset` contains a memory address. This gets the value at this place in actual memory, and dereferences it. This causes ECX to contain an address. – CL40 Jul 04 '21 at 23:44
  • Looks pretty un-optimized to me, otherwise the pointer and offset would probably both be in registers already, instead of on the stack. It's exactly what I'd expect for evaluating `p[i]` as part of an expression in a debug build ([Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394)), with default integer promotion to `int` (hence `movsx` instead of just `movzx` to avoid a false dependency writing `dl`.) – Peter Cordes Jul 05 '21 at 03:25
  • 1
    You have a `char*` *pointer* to a string, not an *array* like `char xyz[] = "test"` that is itself on the stack. Looks like that's the root of your confusion, so it seems to me it's a duplicate of [What is the difference between char s\[\] and char \*s?](https://stackoverflow.com/q/1704407) – Peter Cordes Jul 05 '21 at 03:27

0 Answers0