0

During my journey of assembly language, I came across this strange behavior involving mov instruction.

This unintended behavior is that the address I intended to load gets changed to the equivalent instruction with a different address.

Thanks to the tricks suggested by the fellow StackOverflow users (Calling a function through its address in memory in c / c++ | Using __builtin_extract_return_addr() function to find the RSP value of ret instruction)

I was able to create a simple code (this is pseudocode) to perform a test on loading / comparing addresses:

typedef void function(void);

uint64_t *sp;
asm ("movq %%rsp, %0\n"
       : "=r" (sp) : );//: 
uint64_t *ret_addr;
ret_addr = __builtin_extract_return_addr((void *) *((long *)sp) + 1);

if (ret_addr == 0x40ac45)
        {
           printf("WHY:\n");
        }

function* test_addr = (function*)0x40ac6b;

asm goto (
    "cmp %0, %1\n"
    "jne %l[L2]\n"
        : // output operands
        : "r"(ret_addr), "r"(test_addr) // input operands
        : 
        : L2);
L1: int3();
L2: no_problem();

To summarize, I am obtaining the return address (ret_addr) of the instruction. Then if that address is 0x40ac45, the program will output "WHY:", and then compare that address with the test_addr (0x40ac6b). If these addresses are not equal, then goto no_problem function, otherwise, I will execute int3 function to interrupt.

As shown here in more detail, the bug is that although the RET Addr (0x40ac45) != TEST Addr (0x40ac6b), the program executes trace trap which should only happen when they are equal.

To debug this, I have added the following code to load the ret_addr (which should be 0x40ac45) into one of the empty registers (%%r14):

        asm volatile (
                "mov %0, %%r14\n\t" : : "r"(ret_addr)
            );

Upon running the GDB, I found that in the %%r14, instead of the intended address of 0x40ac45, 0x40ac6b is loaded as you can see here.

Although all of the previous sanity checks have been passed to show that the return address is 0x40ac45, for some reason when I use mov instruction, it loads 0x40ac6b instead.

I did an additional search of whether these two addresses have something in common, and I found out the following upon disassembly:

000000000040ac30 <close_stdout>:
    ----------------
  40ac45:       85 c0                   test   %eax,%eax
    ----------------
  40ac6b:       85 c0                   test   %eax,%eax

They are the same instruction just loaded in the different address.

Which finally leads me to these questions:

  1. What could be the reason why this is happening? Is it due to these two addresses have equivalent instructions?
  2. Is the mov instruction appropriate for this kind of use? I tried using lea instruction, but unfortunately, this did not work ("lea (%%r14), %0") as the r14 register did not obtain the address value.
  3. What other sanity checks I can do to verify that I'm loading the correct address value? My code seems to work for all of the other instructions, just that this particular one is giving me trouble.
  4. Is there any way to "force" load the absolute address without resorting to the hard coding the address value? (e.g., function* ret_addr = (function*)0x40ac45;)

I apologize for the lengthy explanation (I tried to make it short as possible) and thank you for any kind of suggestions.

Kind regards,

Jay
  • 373
  • 1
  • 10
  • 2
    Probably not your problem, but your `asm` statement is missing a clobber on `"r14"` - it's not safe to write registers without telling the compiler about it. And BTW, you used LEA backwards. It would be `lea (%0), %%r14` to copy a register. And no, mov/lea aren't the only instructions you can use. You could just use an empty asm statement to force the compiler to have a value in a register, e.g. `asm("int3" :: "a"(ret_addr))` to force it to have `ret_addr` in RAX at that point. The asm statement can even be empty if you can follow the compiler's asm well enough to see what it's doing. – Peter Cordes Jun 20 '20 at 03:38
  • This doesn't look like a [mcve] so it would be inconvenient to try to reproduce your results. Basically I don't see any reason to assume that the return address will be at `8(%rsp)`. Especially in a function with local vars that calls other functions. (If you are going to assume that, you might as well just load with inline asm directly instead of `mov`ing RSP into another register.) Look at the compiler generate asm and see how far it moves RSP before your asm statement. – Peter Cordes Jun 20 '20 at 03:41

0 Answers0