Design a function in C from assembly code

Question

I need to design a function in C language to achieve what is written in the machine code. I get the assembly operations in steps, but my function is said to be implemented wrong. I am confused.

This is the disassembled code of the function.

(Hand transcribed from an image, typos are possible
 especially in the machine-code.  See revision history for the image)
0000000000000000 <ex3>:
   0:   b9 00 00 00 00       mov    0x0,%ecx
   5:   eb 1b                jmp    L2  // 22 <ex3+0x22>
   7:   48 63 c1     L1:     movslq %ecx,%rax
   a:   4c 8d 04 07          lea    (%rdi,%rax,1),%r8
   e:   45 0f b6 08          movzbl (%r8),%r9d
  12:   48 01 f0             add    %rsi,%rax
  15:   44 0f b6 10          movzbl (%rax),%r10d
  19:   45 88 10             mov    %r10b,(%r8)
  1c:   44 88 08             mov    %r9b,(%rax)
  1f:   83 c1 01             add    $0x1,%ecx
  22:   39 d1        L2:     cmp    %edx,%ecx
  24:   7c e1                jl     L1   // 7 <ex3+0x7>
  26:   f3 c3                repz retq

My code(the signature of the function is not given or settled):

#include <assert.h>

int
ex3(int rdi, int rsi,int edx, int r8,int r9 ) {
    int ecx = 0;
    int rax;
    if(ecx>edx){
         rax = ecx;
        r8 =rdi+rax;
        r9 =r8;
        rax =rsi;
        int r10=rax;
        r8=r10;
        rax =r9;
        ecx+=1;
    }
    return rax;
}

Please explain what cause the bugs if you recognize any.

`r` registers are 64-bit. `int` is a 32-bit type in x86-64 C implementations. You'd use `int edi` if it was actually `int`, like you did for `edx`. More importantly, the asm contains a backwards conditional branch but your C is missing a loop. Also, the asm has a bunch of zero-extending byte loads from memory but your C doesn't have any pointer variables. — Peter Cordes, Nov 10 '19 at 22:18
Please edit your question to include the asm code. Don't post a link to a picture and don't replace that link with a picture itself. Take your time and type it in the question. — gstukelj, Nov 10 '19 at 22:19
Another hint: what is the next instruction after `add $0x1,%ecx`? — aschepler, Nov 10 '19 at 22:21
*"but my function is said to be implemented wrong"* - By whom and on what grounds? — klutt, Nov 10 '19 at 22:23
I've downvoted your question because you have posted a picture of code. WIll remove my downvote once you replace it with code as text. — fuz, Nov 10 '19 at 22:23
@fuz: Steve Friedl made the effort to translate the picture to text. I removed the picture, so now, the question should be ok. — zx485, Nov 10 '19 at 23:07

klutt · Answer 1 · 2019-11-11T16:09:59.683

(Editor's note: this is a partial answer that only addresses the loop structure. It doesn't cover the movzbl byte loads, or the fact that some of these variables are pointers, or type widths. There's room for other answers to cover other parts of the question.)

C supports goto and even though the usage of them is often frowned upon, they are very useful here. Use them to make it as similar to the assembly as possible. This allows you to make sure that the code works before you start introducing more proper control flow mechanisms, like while loops. So I would do something like this:

    goto L2;
L1:
    rax = ecx;
    r8 =rdi+rax;
    r9 =r8;
    rax =rsi;
    int r10=rax;
    r8=r10;
    rax =r9;
    ecx+=1;
L2:
    if(edx<ecx) 
        goto L1;

You can easily transform the above code to:

while(edx<ecx) {
    rax = ecx;
    r8 =rdi+rax;
    r9 =r8;
    rax =rsi;
    int r10=rax;
    r8=r10;
    rax =r9;
    ecx+=1;
}

Note that I have not checked if the code within the L1-block and then later the while block is correct or not. (Editor's note: it's missing all the memory accesses). But your jumping was wrong and is now corrected.

What you can do from here (again, assuming that this is correct) is to start trying to see patterns. It seems like ecx is used as some kind of index variable. And the variable rax can be replaced in the beginning. We can do a few other similar changes.This gives us:

int i=0;
while(edx<i) {
                    // rax = ecx;
                    // r8 =rdi+i; // r8=rdi+i
                    // r9 = rdi + i; // r9 = r8
                    // rax =rsi;
    int r10 = rsi;  // int r10=rax;
    r8 = r10;
    rax = r9 = rdi+i;
    i++;
}

Here it clearly seems like something is a bit iffy. The while condition is edx<i but i is incremented and not decremented each iteration. That's a good indication that something is wrong. I'm not skilled enough in assembly to figure it out, but at least this is a method you can use. Just take it step by step.

add $0x1,%ecx is AT&T syntax for incrementing ecx by 1. According to this site using Intel syntax, the result is stored in the first operand. In AT&T syntax, that's the last operand.

One interesting thing to notice is that if we removed the goto L2 statement, this would instead be equivalent to

do {
    // Your code
} while(edx<ecx);

A while-loop can be compiled to a do-while-loop with an additional goto. (See Why are loops always compiled into "do...while" style (tail jump)?). It's pretty easy to understand.

In assembly, loops are made with gotos that jump backward in the code. You test and then decide if you want to jump back. So in order to test before the first iteration, you need to jump to the test first. (Compilers also sometimes compile while loops with an if()break at the top and a jmp at the bottom. But only with optimization disabled. See While, Do While, For loops in Assembly Language (emu8086))

Forward jumping is often the result of compiling if statements.

I also just realized that I now have three good ways to use goto. The first two is breaking out of nested loops and releasing resources in opposite order of allocation. And now the third is this, when you reverse engineer assembly.

Three good ways to use actually use `goto` in practice in C? Or do you just mean as part of reverse-engineering? The asm in the question is how GCC compiles a `while(){}` or `for` loop in C at `-O0` and `-O1`. (In C++ mode, `g++ -O0` likes to make a stupid loop with an `if()break` at the top and and unconditional `jmp` at the bottom.) See also [Why are loops always compiled into "do...while" style (tail jump)?](//stackoverflow.com/q/47783926) - with optimization enabled, compilers peel the first test and then make a normal `do{}while()` loop with the condition at the bottom like here. — Peter Cordes, Nov 11 '19 at 00:44
BTW, this is only a very partial answer. You covered loop structure nicely, but types and addressing modes are still a huge problem. The OP is not even close to getting the `movzbl` byte loads correct. — Peter Cordes, Nov 11 '19 at 00:45
I now realized about the jumping and loop. But I am more concerned about how to show different length of bit register in C. And how to show operations like _movzbl_ in C. Do I use any convertor? — chloezql, Nov 11 '19 at 01:13
@PeterCordes I meant as a part of reverse engineering. And yes, I know it's a partial answer. I did not feel comfortable doing the rest. — klutt, Nov 11 '19 at 07:47
I upvoted for the part you did address. That's totally fine, as long as we avoid any hint that this was the only problem. Otherwise the OP would have wasted their time hoping your change was sufficient. — Peter Cordes, Nov 11 '19 at 13:32
Sorry, stepped on an edit while fixing your mixup about operand order in AT&T syntax vs. Intel syntax. See https://stackoverflow.com/tags/att/info. The loop is doing `while(i — Peter Cordes, Nov 11 '19 at 16:04

score 1 · Answer 2 · answered Nov 11 '19 at 05:21

I am pretty sure it is this: swap two areas of memory:

void memswap(unsigned char *rdi, unsigned char *rsi, int edx) {
        int ecx;
        for (ecx = 0; ecx < edx; ecx++) {
                unsigned char r9 = rdi[ecx];
                unsigned char r10 = rsi[ecx];
                rdi[ecx] = r10;
                rsi[ecx] = r9;
        }
}

Brandon · Answer 3 · 2019-11-11T16:47:45.197

-1

For those that prefer a .S format for GCC, I used:

ex3:
  mov $0x0, %ecx
  jmp lpe
  lps:
      movslq %ecx, %rax
      lea (%rdi, %rax, 1), %r8
      movzbl (%r8), %r9d
      add %rsi, %rax
      movzbl (%rax), %r10d
      mov %r10b, (%r8)
      mov %r9b, (%rax)
      add $0x1, %ecx
  lpe:
  cmp %edx, %ecx
  jl lps
repz retq


.data
.text
.global _main
_main:
    mov $0x111111111111, %rdi
    mov $0x222222222222, %rsi
    mov $0x5, %rdx
    mov $0x333333333333, %r8
    mov $0x444444444444, %r9
    call ex3
    xor %eax, %eax
    ret

you can then compile it with gcc main.S -o main and run objdump -x86-asm-syntax=intel -d main to see it in intel format OR run the resulting main executable in a decompiler.. but meh.. Let's do some manual work..

First I would convert the AT&T syntax to the more commonly known Intel syntax.. so:

ex3:
  mov ecx, 0
  jmp lpe
  lps:
      movsxd rax, ecx
      lea r8, [rdi + rax]
      movzx r9d, byte ptr [r8]
      add rax, rsi
      movzx r10d, byte ptr [rax]
      mov byte ptr [r8], r10b
      mov byte ptr [rax], r9b
      add ecx, 0x1
  lpe:
  cmp ecx, edx
  jl lps
rep ret

Now I can see clearly that from lps (loop start) to lpe (loop end), is a for-loop.

How? Because first it sets the counter register (ecx) to 0. Then it checks if ecx < edx by doing a cmp ecx, edx followed by a jl (jump if less than).. If it is, it runs the code and increments ecx by 1 (add ecx, 1).. if not, it exists the block..

Thus it looks like: for (int32_t ecx = 0; ecx < edx; ++ecx).. (note that edx is the lower 32-bits of rdx).

So now we translate the rest with the knowledge that:

r10 is a 64-bit register. r10d is the upper 32 bits, r10b is the lower 8 bits. r9 is a 64-bit register. Same logic as r10 applies.

So we can represent a register as I have below:

typedef union Register
{
    uint64_t reg;
    struct
    {
        uint32_t upper32;
        uint32_t lower32;
    };

    struct
    {
        uint16_t uupper16;
        uint16_t ulower16;
        uint16_t lupper16;
        uint16_t llower16;
    };

    struct
    {
        uint8_t uuupper8;
        uint8_t uulower8;

        uint8_t ulupper8;
        uint8_t ullower8;

        uint8_t luupper8;
        uint8_t lulower8;

        uint8_t llupper8;
        uint8_t lllower8;
    };
} Register;

Whichever is better.. you can choose for yourself.. Now we can start looking at the instructions themselves.. movsxd or movslq moves a 32-bit register into a 64-bit register with a sign extension.

Now we can write the code:

uint8_t* ex3(uint8_t* rdi, uint64_t rsi, int32_t edx)
{
    uintptr_t rax = 0;
    for (int32_t ecx = 0; ecx < edx; ++ecx)
    {
        rax = ecx;
        uint8_t* r8 = rdi + rax;
        Register r9 = { .reg = *r8 }; //zero extend into the upper half of the register
        rax += rsi;

        Register r10 = { .reg = *(uint8_t*)rax }; //zero extend into the upper half of the register
        *r8 = r10.lllower8;
        *(uint8_t*)rax = r9.lllower8;
    }

    return rax;
}

Hopefully I didn't screw anything up..

edited Nov 11 '19 at 16:47

answered Nov 11 '19 at 02:01

Brandon

22,723
11
93
186

Thank you very much. I am very new to these, do you know how to write the `reinterpret_cast` or just `int8_t int64_t` etc. in C? I don't know how to show the different bit registers . What does the `union Register` part do in the beginning? – chloezql Nov 11 '19 at 02:23
@chloezql Added C translated code & removed C++ code. `int64_t` is declared in `` – Brandon Nov 11 '19 at 02:42
1

IMO your register union should be simpler and only include members that are directly accessible as x86 partial registers. So the only struct member you need is a pair of 8-bit AL/AH halves. – Peter Cordes Nov 11 '19 at 02:52
1

How are you getting `int64_t* r8` out of the OP's asm? I only see byte pointers (`uint8_t*`). Even the pointer-math is unscaled, so again `char*` or `unsigned char*`. You're seriously over-complicating this with casting. – Peter Cordes Nov 11 '19 at 02:56
Oh, I see you mistranslated `movzbl` into `mov r9d, dword ptr [r8]`. It's actually `movzx r9d, byte ptr [r8]` – Peter Cordes Nov 11 '19 at 02:57
@PeterCordes; Weird.. My `objdump` shows the instruction as `mov` and not `movzbl`: https://imgur.com/trD8pu3 – Brandon Nov 11 '19 at 03:04
@Brandon: You have different machine code from the question. Of course if you assemble the wrong AT&T instructions, you'll get matching Intel-syntax instructions. (The question's machine code was manually transcribed from an image, double-check the original image for any inconsistencies: https://i.stack.imgur.com/gWVN6.png. But the `movzx` instruction is transcribed correctly: `45 0f b6 08`. `0f b6` is the right opcode for [`movzx`](https://www.felixcloutier.com/x86/movzx)) – Peter Cordes Nov 11 '19 at 03:09
@PeterCordes; Ahh shit.. you are right. One sec I'll make an edit :l – Brandon Nov 11 '19 at 03:12
Also, the OP's asm doesn't zero a bunch of registers before the loop. It doesn't write any partial registers and you can and should represent that in C with explicit or implicit conversion from narrow unsigned (for zero-extension) to the destination type (`uint64_t`). Writing `r10.lower32` in asm zero-extends into the full register; that's why compilers use `movzx` for byte loads. [Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?](//stackoverflow.com/q/11177137) – Peter Cordes Nov 11 '19 at 03:13
Ok, that's an improvement. I might write `rax = ecx` as `intptr_t rax = ecx` (sign extension); it's certainly not zeroed outside the loop. But it's used once as an integer index and then later as `uint8_t*` ; I might represent that with a 2nd variable name for the same register (SSA style, giving a new C name to each instruction's result). Or to avoid a mess, only doing that when convenient, and only for registers that are rewritten each loop iteration. – Peter Cordes Nov 11 '19 at 03:24
Also, you incorrectly wrote RDX instead of EDX; a compare in asm needs both operands the same size. And `jl` is a signed compare so `edx` (and `ecx`) have signed types; that's why not-full-optimization leaves a sign-extension inside the loop. Try compiling your function with `gcc -O1` or `-Og` on Godbolt and see if you get the same asm, modulo minor instruction-order changes. – Peter Cordes Nov 11 '19 at 03:26
Also, x86 is little-endian, and earlier struct member have lower addresses. So your structs (inside the union) are all backwards. If the code was doing any partial-register merging (via writing 8 or 16-bit partial registers then reading wider), your union idea would make a lot of sense. But C's type rules are already built around widening to `int` even if you *don't* assign to a variable, just like this asm is doing, so this is really just mental / visual noise. Assigning a `uint32_t` to `*byte_ptr` is well-defined as taking the least-significant bits for `uint8_t*` – Peter Cordes Nov 11 '19 at 03:48
`ecx < (rdx & 0xFFFFFFFF)` doesn't even get the correct result. In C, the right-hand size has type `int64_t`, forcing promotion of the left-hand side to the same type. And right RHS is now signed non-negative in the 0 .. 2^32-1 range. You'd need to cast the RHS back to `(int32_t)` after 64-bit masking. But seriously, just declare the function arg as `int32_t edx`. x86 doesn't care about garbage in the high half of RDX when doing `cmp ecx, edx`; that's why calling conventions *allow* high garbage. This function is not explicitly truncating RDX to EDX anywhere. – Peter Cordes Nov 11 '19 at 13:44
Also, I tried your function on Godbolt. https://godbolt.org/z/g7syYa. Because of writing your structs backwards, zero-extend -> reading the high byte (instead of low) optimizes to always storing `0`. Also, `.reg = *(uint32_t*)rax` should be `uint8_t*`. But at least the `jmp` into the loop looks somewhat similar to the original. – Peter Cordes Nov 11 '19 at 13:45
Your last change is an improvement, but you still haven't fixed your structs inside your union. You're zero-extending and then reading the *high* byte, which on a little-endian machine like x86 is always zero. – Peter Cordes Nov 11 '19 at 17:57
Hmm I don't understand what is wrong with the structs. I looked at it many times I just can't see it :( – Brandon Nov 11 '19 at 19:08

Design a function in C from assembly code

3 Answers3