3

This is my C program ... I was trying to print out ESP, EBP and EIP.

#include <stdio.h>
int main() {

    register int i asm("esp");
    printf("%#010x <= $ESP\n", i);

    int a = 1;
    int b = 2;
    char c[] = "A";
    char d[] = "B";

    printf("%p d = %s \n", &d, d);
    printf("%p c = %s \n", &c, c);
    printf("%p b = %d \n", &b, b);
    printf("%p a = %d \n", &a, a);

    register int j asm("ebp");
    printf("%#010x <= $EBP\n", j);

    //register int k asm("eip");
    //printf("%#010x <= $EIP\n", k);

    return 0;
}

I don't have problem with ESP and EBP.

user@linux:~# ./memoryAddress 
0xbffff650 <= $ESP
0xbffff654 d = B 
0xbffff656 c = A 
0xbffff658 b = 2 
0xbffff65c a = 1 
0xbffff668 <= $EBP
user@linux:~# 

But when I try to put EIP code, I'm getting the following error when compiling it.

user@linux:~# gcc memoryAddress.c -o memoryAddress -g
memoryAddress.c: In function ‘main’:
memoryAddress.c:20:15: error: invalid register name for ‘k’
  register int k asm("eip");
               ^
user@linux:~#

What's wrong with this code?

register int k asm("eip");
printf("%#010x <= $EIP\n", k);

Is it possible to print out EIP value via C programming?

If yes, please let me know how to do it.

Update

I've tested the code here ...

user@linux:~/c$ lscpu
Architecture:        i686
CPU op-mode(s):      32-bit
Byte Order:          Little Endian

Thanks @Antti Haapala and others for your help. The code works ... However, when I load it into GDB, the EIP value is different.

(gdb) b 31
Breakpoint 1 at 0x68f: file eip.c, line 31.
(gdb) i r $eip $esp $ebp
The program has no registers now.
(gdb) r
Starting program: /home/user/c/a.out 
0x00000000 <= Low Memory Address
0x40055d   <= main() function
0x4005a5   <= $EIP 72 bytes from main() function (start)
0xbffff600 <= $ESP (Top of the Stack)
0xbffff600 d = B 
0xbffff602 c = A 
0xbffff604 b = 2 
0xbffff608 a = 1 
0xbffff618 <= $EBP (Bottom of the Stack)
0xffffffff <= High Memory Address

Breakpoint 1, main () at eip.c:31
31              return 0;
(gdb) i r $eip $esp $ebp
eip            0x40068f 0x40068f <main+306>
esp            0xbffff600       0xbffff600
ebp            0xbffff618       0xbffff618
(gdb) 

Here is the new code

#include <stdio.h>
#include <inttypes.h>

int main() {

    register int i asm("esp");
    printf("0x00000000 <= Low Memory Address\n");
    printf("%p   <= main() function\n", &main);

    uint32_t eip;
    asm volatile("1: lea 1b, %0;": "=a"(eip));
    printf("0x%" PRIx32 "   <= $EIP %" PRIu32 " bytes from main() function (start)\n",
    eip, eip - (uint32_t)main);

    int a = 1;
    int b = 2;
    char c[] = "A";
    char d[] = "B";

    printf("%#010x <= $ESP (Top of the Stack)\n", i);

    printf("%p d = %s \n", &d, d);
    printf("%p c = %s \n", &c, c);
    printf("%p b = %d \n", &b, b);
    printf("%p a = %d \n", &a, a);

    register int j asm("ebp");
    printf("%#010x <= $EBP (Bottom of the Stack)\n", j);
    printf("0xffffffff <= High Memory Address\n");

    return 0;
}
  • 1
    You cannot print `eip` portably, because it is x86 specific – Basile Starynkevitch May 22 '18 at 15:23
  • 2
    The error message seems to indicate that your assembler does not recognize "eip" as the name of an accessible register. Register names are architecture dependent, but the GNU assembler does not list "eip" among the recognized register names for i386. – John Bollinger May 22 '18 at 15:26
  • 1
    @BasileStarynkevitch eip isn't x86 specific as you can access it in 64bits binaries. – Nark May 22 '18 at 15:32
  • 3
    But it does not exist on ARM processors, and is not mentioned in the C standard [n1570](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) – Basile Starynkevitch May 22 '18 at 15:33
  • 1
    Shouldn't your `int` be `int*`? As `EIP` is a pointer ? (also, try `asm("pc")` , or `register int* k volatile asm("eip")`) – Nark May 22 '18 at 15:34
  • 1
    mind you, those values have zero relevance to the C abstract machine, so if you think you need them for something more than being curious, you have some kind of XY problem, and you are heading for some problematic solution (err... more like problematic problem, because it will probably not solve anything, so it's not solution). If you are curious, use debugger, switch to assembly level upon entering the C function, and you can check actual register values inside debugger, plus watching their state per instruction. – Ped7g May 22 '18 at 15:42
  • 4
    And to get rough `eip/rip` of your current code in at least somewhat C-way, you can print address of current function (beginning of it), like `printf("%p\n", &main);` – Ped7g May 22 '18 at 15:45
  • 2
    https://stackoverflow.com/a/24593592/4832634 – Paweł Łukasik May 22 '18 at 16:07
  • 3
    I'm going to make the observation that `register int j asm("ebp");` doesn't actually guarantee that the variable `j` will have the value of EBP in it UNLESS it is used as an input (or input/output) constraint to an extended inline assembly template. The fact it works here is more or less by luck. The GCC documentation makes this a particular point – Michael Petch May 22 '18 at 16:57
  • 1
    Oops i really meant input, output, input/output operands. The documentation says _The only supported use for this feature is to specify registers for input and output operands when calling Extended asm (see Extended Asm)_ https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html – Michael Petch May 22 '18 at 17:40
  • 2
    Maybe we should have asked right up front. What value of EIP are you trying to get, and how do you intend to use it. Ped7g alluded to it when he said this might be an XY problem. – Michael Petch May 23 '18 at 03:15
  • 1
    *However, when I load it into GDB, the EIP value is different.* Looks like Antti built a position-independent executable with ASLR enabled. GDB disables ASLR by default, but a PIE executable's default code address will be different from a non-PIE position-*de*pendent executable. `0x40055d` looks normal for a non-PIE executable. – Peter Cordes May 23 '18 at 03:19
  • Thanks @MichaelPetch for your feedback. I'm just trying to figure out how stack works by writing this program. Btw, the new code has been updated. –  May 23 '18 at 03:24
  • 1
    Is your concern that the value `0x40068f` is not `0x4005a5` or is it that the EIP when run outside the debugger has a very different looking address as compared to what is being displayed in the debugger? If so Peter's comment above would apply. – Michael Petch May 23 '18 at 03:26
  • 3
    I'd suggest that if you are trying to learn how the stack works then writing pure assembly code (forget inline assembly and _C_) and then using GDB to step through it and reviewing the stack value and contents as you step through the instruction would be more useful. The problem is that a C compiler may generate a bunch of instructions that alter what is on the stack that is hard to track just by printing values (at least that is my opinion). Different optimizing levels may also alter the type of code generated. Value of EIP isn't very useful if studying the stack – Michael Petch May 23 '18 at 03:30
  • Hi @LưuVĩnhPhúc, I've tried to compile your code but didn't work. https://pastebin.com/UcamNkgQ –  May 23 '18 at 04:15
  • The numbers you've got look very right. – Antti Haapala -- Слава Україні May 23 '18 at 07:08

3 Answers3

10

Please first read the QA Reading program counter directly - from there we can see that there are no mov commands to access the EIP/RIP directly, therefore you cannot use register asm to get access to it. Instead at any point you can use those tricks. It is easiest in 64-bit mode, use

uint64_t rip;
asm volatile("1: lea 1b(%%rip), %0;": "=a"(rip));

to get the 64-bit instruction (thanks Michael Petch for pointing out that a label works with lea here.

Demonstration:

#include <stdio.h>
#include <inttypes.h>

int main(void) {
    uint64_t rip;
    asm volatile("1: lea 1b(%%rip), %0;": "=a"(rip));
    printf("%" PRIx64 "; %" PRIu64 " bytes from main start\n",
           rip, rip - (uint64_t)main);
}

Then

% gcc -m64 rip.c -o rip; ./rip
55b7bf9e8659; 8 bytes from start of main

Proof that it is correct:

% gdb -batch -ex 'file ./rip' -ex 'disassemble main'
Dump of assembler code for function main:
   0x000000000000064a <+0>:     push   %rbp
   0x000000000000064b <+1>:     mov    %rsp,%rbp
   0x000000000000064e <+4>:     sub    $0x10,%rsp
   0x0000000000000652 <+8>:     lea    -0x7(%rip),%rax        # 0x652 <main+8>

For 32-bit code it seems you can use lea with a label - this didn't work for 64-bit code though.

#include <stdio.h>
#include <inttypes.h>

int main(void) {
    uint32_t eip;
    asm volatile("1: lea 1b, %0;": "=a"(eip));
    printf("%" PRIx32 "; %" PRIu32 " bytes from main start\n",
           eip, eip - (uint32_t)main);
}

Then

% gcc -m32 eip.c -o eip; ./eip
5663754a; 29 bytes from main start

Proof that it is correct:

% gdb -batch -ex 'file ./eip' -ex 'disassemble main'  
Dump of assembler code for function main:
   0x0000052d <+0>:     lea    0x4(%esp),%ecx
   0x00000531 <+4>:     and    $0xfffffff0,%esp
   0x00000534 <+7>:     pushl  -0x4(%ecx)
   0x00000537 <+10>:    push   %ebp
   0x00000538 <+11>:    mov    %esp,%ebp
   0x0000053a <+13>:    push   %ebx
   0x0000053b <+14>:    push   %ecx
   0x0000053c <+15>:    sub    $0x10,%esp
   0x0000053f <+18>:    call   0x529 <__x86.get_pc_thunk.dx>
   0x00000544 <+23>:    add    $0x1a94,%edx
   0x0000054a <+29>:    lea    0x54a,%eax

(in the 32-bit version there are many more lea commands, but this one is the "load my constant address here", which then will be corrected by the dynamic linker when it loads the exe).

rturrado
  • 7,699
  • 6
  • 42
  • 62
  • 1
    A way without the -7 would be `asm volatile("1: lea 1b(%%rip), %0;": "=a"(rip));` – Michael Petch May 22 '18 at 21:03
  • 1
    @MichaelPetch ah thanks, fixed. This was the first time I've ever used RIP-relative addressing. – Antti Haapala -- Слава Україні May 22 '18 at 21:29
  • You don't need inline asm at all for this; [the GNU C labels-as-values extension](https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html) does the trick. @MichaelPetch: And if you do want inline asm to read RIP directly, why not `lea 0(%rip)` instead of calculating the address of some nearby label? Then you'll see how RIP really works, i.e. that you get the address of the start of the next instruction. – Peter Cordes May 22 '18 at 22:56
  • @PeterCordes I never said you needed to do it with inline, I am saying what is wrong with THIS inline. Did you see me present an **answer** saying as such? In fact my biggest issue is with all the inline including the problems in the OPs inline with regards to even things like EBP and other registers. As I said under the other answer _Yep, inline may seem simple at times but it is a complex beast that should only be used as a last resort IMHO_ . – Michael Petch May 22 '18 at 22:57
  • @MichaelPetch: Sorry, my punctuation was unclear. I was pinging you re: the 2nd sentence in my comment, since I'm suggesting that `0(%rip)` is more "interesting" than `1b(%rip)`. The first sentence was a separate topic (which I posted as an answer), and directed mostly at Antti and/or the OP. – Peter Cordes May 22 '18 at 23:00
  • @PeterCordes : I don't know why it is more interesting in the context of this question - you end up with an EIP/RIP with an address that is offset by the length of the code to get the instruction pointer where likely you are interested in where the IP was just prior to the inline assembly. Of course I personally find using EIP/RIP this way mostly useless. I can make a case if you are writing an interrupt handler where EIP/RIP was pushed on the stack by the processor at the point of some failure (or it being raised). – Michael Petch May 22 '18 at 23:04
  • @MichaelPetch: Because you're reading RIP with no offset, rather than with an assembler-calculated offset to get the address of a label. The OP seems to want to know how RIP "really" works, and hiding it's behaviour under some assembler magic doesn't help. (That's my guess about what might help the OP. Totally agreed that this is useless because compilers already know how to make PIC code and do relative addressing.) – Peter Cordes May 22 '18 at 23:10
  • @PeterCordes :Think you are reading more into the question given that the ultimate question the OP had was _Is it possible to print out EIP value via C programming?_ His question says nothing about how EIP works but how to print its value (which is the theme reiterated a few times in the question AND the title). If the goal is to print out EIP/RIP likely he wants the value not including the instructions to get the IP. This in itself seems rather rather pointless. – Michael Petch May 22 '18 at 23:12
  • @MichaelPetch: I was reading it as "how do I get meaningless raw register values to peek under the hood of the C compiler". Your interpretation is valid, too. – Peter Cordes May 22 '18 at 23:17
  • Thanks @AnttiHaapala ... The code works ... However, when I load it into GDB, the EIP value is different. I've updated the question above. –  May 23 '18 at 03:11
1

EIP can't be read directly. RIP can, with lea 0(%rip), %rax, but it's not a general-purpose register.

Instead of reading an address from a register you can just use a code address directly.

void print_own_address() {
    printf("%p\n", print_own_address);
}

If you compile this as PIC (position-independent code), the compiler will get the run-time address of the function by reading EIP or RIP for you. You don't need inline asm for this.


Or for addresses other than functions, GNU C allows labels as values.

void print_label_address() {
    for (int i=0 ; i<1000; i++) {
        volatile int sink = i;
    }
  mylabel:
    for (int i=0 ; i<1000; i++) {
        volatile int sink2 = i;
    }
    printf("%p\n", &&mylabel);   // Take the label address with && GNU C syntax.

}

Compiled on the Godbolt compiler explorer with an without -fPIE to generate position-independent code, we get:

  # PIE version:
    xor     eax, eax                   # i=0
.L4:                                   # do {
    mov     DWORD PTR -16[rsp], eax    #  sink=i
    add     eax, 1
    cmp     eax, 1000
    jne     .L4                        # } while(i!=1000);

    xor     eax, eax                   # i=0
.L5:                                   # do {
    mov     DWORD PTR -12[rsp], eax    # sink2 = i
    add     eax, 1
    cmp     eax, 1000
    jne     .L5                        # }while(i != 1000);

    lea     rsi, .L5[rip]           # address of .L5 = mylabel
    lea     rdi, .LC0[rip]          # format string
    xor     eax, eax                # 0 FP args in XMM regs for a variadic function
    jmp     printf@PLT              # tailcall printf

Without -fPIE, the addresses are link-time constants (and fit in a 32-bit constant), so we get

    mov     esi, OFFSET FLAT:.L5
    mov     edi, OFFSET FLAT:.LC0
    xor     eax, eax
    jmp     printf

Whether you get a meaningful address from your label or not depends on how aggressively the compiler optimized the code where you put it. Putting a label somewhere may inhibit optimization (like autovectorization) if you even take the label address, but IDK. Maybe it would only hurt if you actually had a goto to it.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0

You can read rip with another small hack if you are interested. here your full code which reads rip too:

#include <stdio.h>
#include <inttypes.h>
int main()
{
    register uint64_t i asm("rsp");
    printf("%" PRIx64 " <= $RSP\n", i);

    int a = 1;
    int b = 2;
    char c[] = "A";
    char d[] = "B";

    printf("%p d = %s \n", &d, d);
    printf("%p c = %s \n", &c, c);
    printf("%p b = %d \n", &b, b);
    printf("%p a = %d \n", &a, a);

    register uint64_t j asm("rbp");
    printf("%" PRIx64 " <= $RBP\n", j);

    uint64_t rip = 0;

    asm volatile ("call here2\n\t"
                  "here2:\n\t"
                  "pop %0"
                  : "=m" (rip));
    printf("%" PRIx64 " <= $RIP\n", rip);

    return 0;
}

Hack here is a fun one. You just call next assembly line. now because return address which is rip in stack, you can retrieve it by a pop instruction from stack. :)

Update:

The main reason for this approach is data injection. see following code:

#include <stdio.h>
#include <inttypes.h>
int main()
{
    uint64_t rip = 0;

    asm volatile ("call here2\n\t"
                  ".byte 0x41\n\t" // A
                  ".byte 0x42\n\t" // B
                  ".byte 0x43\n\t" // C
                  ".byte 0x0\n\t"  // \0
                  "here2:\n\t"
                  "pop %0"
                  : "=m" (rip));
    printf("%" PRIx64 " <= $RIP\n", rip);
    printf("injected data:%s\n", (char*)rip);

    return 0;
}

This approach can inject data inside code segment(which can be usefull for code injection). If you compile and run, you see following output:

400542 <= $RIP
injected data:ABC

You have used rip as placeholder for your data. I personally like this approach, but it can have efficiency impacts as mentioned in comments.

I have tested both codes in 64-bit Ubuntu bash for Windows(Linux subsystem for Windows) and both works.

Update 2:

Please make sure to read comments about red zones. Thanks michael a lot for mentioning this problem and providing an example. :) If you need to use this code without red zone problem, you need to write it as following (from micheal's sample):

asm volatile ("sub $128, %%rsp\n\t"
              "call 1f\n\t"
              ".byte 0x41\n\t" // A
              ".byte 0x42\n\t" // B
              ".byte 0x43\n\t" // C
              ".byte 0x0\n\t"  // \0
              "1:\n\t"
              "pop %0\n\t"
              "add $128, %%rsp"

              : "=r" (rip));
Afshin
  • 8,839
  • 1
  • 18
  • 53
  • 1
    I think the other answer by Antti Haapala is better since it simply relies on using RIP relative addressing and use of the LEA instruction to get the instruction pointer. Although your method works it will cause problems with return address predictor and impose a performance penalty. – Michael Petch May 22 '18 at 17:56
  • The other problem is that `here2` is not a unique localized label. If you were to copy that around your code you may find that you'd get duplicate labels. – Michael Petch May 22 '18 at 18:00
  • @MichaelPetch you can use `call +9(%%rip)` rather than label I guess (I'm not gcc expert, so I have written this code by reading its manual). In addition, this approach is something special that you can use for data injection. I didn't mention it because it was not related to question. – Afshin May 22 '18 at 18:03
  • 3
    There of course is one other subtle issue **if this is 64-bit code on Linux** (I noticed in the users output he seems to be on Linux). If this assembly template is included in a leaf function (doesn't call other functions) and optimizations are on, you run the risk of clobbering data in the [Red Zone](https://en.wikipedia.org/wiki/Red_zone_(computing)). You'd have to really consider subtracting 128 from RSP at the beginning of the inline assembly and then restore RSP at the end (add 128 to RSP). – Michael Petch May 22 '18 at 18:19
  • @MichaelPetch I used `asm volatile` to prevent optimization for this small part of code. As I told, I'm not expert in gcc assembler but I ran that code in 64-bit linux and it works fine(to be more accurate, I tested it on Ubuntu bash on Windows which emulate complete linux subsystem). I updated reply with small data injection too. – Afshin May 22 '18 at 18:24
  • 1
    `asm volatile` doesn't prevent all optimizations, but even if it did - it doesn't prevent the compiler generating code for the red zone. Just because it gives the output in your case that is correct, doesn't mean it is right. Inline assembly and its nuances are one reason using it sparingly is preferred and then you have to know what you are doing. I am pretty sure I could generate a test case where this would fail – Michael Petch May 22 '18 at 18:28
  • @MichaelPetch I will be happy if I see that code. :) I love assembly and reverse engineering, so I will be happy if you could create a case that this approach fails. :) learning new thing is always great and I try to learn more all the time. :) – Afshin May 22 '18 at 18:30
  • @MichaelPetch ah good point about the red zone, I had totally forgotten its existence (too little inline assembly I guess). – Antti Haapala -- Слава Україні May 22 '18 at 18:34
  • 2
    I have WSL (WindowsSubsystem for Linux). Its up to date and is using `gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)` . The test program is at http://www.capp-sysware.com/misc/stackoverflow/redzone.c . Except for the fact that I put the `rip` variable at global scope (will break on more environments this way with the red zone on) it is your code. I also created a version that accounts for the red zone. The program should print all 0's for the 3 tests. With WSL using `gcc test.c -O0 -mred-zone` your test prints 1 (not 0) . If I use `gcc test.c -O0 -mno-red-zone` all tests come back 0. – Michael Petch May 22 '18 at 19:44
  • The test function simply has a variable that is initialized at the start and is compared to its original value. If they are different function returns 1 (fail), otherwise it returns 0 (success). The reason you can get failures with red zone on with your code is if the compiler places the `sval` variable on the stack under the return address. When you do your `CALL` in inline assembly the return value you have pushed on the stack can clobber the area where the `sval` variable lives. – Michael Petch May 22 '18 at 19:52
  • 1
    @AnttiHaapala : Yep, inline may seem simple at times but it is a complex beast that should only be used as a last resort IMHO. – Michael Petch May 22 '18 at 20:04
  • 1
    @MichaelPetch Thanks a lot for providing that sample. it was a great new learning topic for me :) – Afshin May 23 '18 at 08:00