1

I am working with format string vulnerabilities in C and I am trying to print the value of the "argc" integer, through a printf command, given in the terminal.

My current code is:

int main (int argc, char **argv) {

char buffer[32];

*More variables*

strncpy(buffer, argv[1], sizeof(buffer));
printf(buffer);

*More printf's*

}

I may need to use format specifiers to print the content of the integer argc into the terminal, but I can't seem to find a solution. All of my guesses are getting me all of the argv stack registers (%rsi, %rdx, %rcx, %r8d, %r9d).

The format string should be given in the terminal, like the example below:

./format-string %d_%s

Is it possible to get the argc value? If yes, how can I do it?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Miguel Santana
  • 247
  • 5
  • 14
  • I don't understand what you are trying to do. Are you trying to write a program that is vulnerable to a string formatting attack? What is your goal? – Cheatah Oct 18 '20 at 11:35
  • 1
    @Cheatah No, I want to hack this program and get the integer value of the argc variable with format specifiers given on the execution of the program – Miguel Santana Oct 18 '20 at 11:38
  • Are you sure the omitted code or omitted environment is not relevant? As given, on x86-64 I don't think there's a way to get `argc` since `rdi` (or `rcx` on Windows) gets overwritten many times. But in the SYS V x86-64 ABI `argc` is on the stack when `_start` is called so I think you can reach it with `%n` where `` is a suitable integer. – Margaret Bloom Oct 18 '20 at 11:51
  • I was missing a `$` in the format above. For example, this works in my environment: `./format-string '%40$p'` – Margaret Bloom Oct 18 '20 at 11:59
  • Again, sorry, I'm a bit sloppy right now. `%40$d` is fine. No need to use `p`. Adjust the integer to your environment (you can check `rsp` at `_start` and right before the call to `printf`, then subtract the values, divide by 8 and add 4). – Margaret Bloom Oct 18 '20 at 12:06
  • 1
    Don't guess. Look at the assembly code – stark Oct 18 '20 at 12:23

3 Answers3

3

I tried to request a few clarifications in the comment but you didn't answer, so I'm assuming you are working in an environment that conforms to SYS V x86-64 ABI.

When main is called, argc is in rdi but it is soon overwritten by the calls to strncpy and printf itself:

main:
    sub     rsp, 40
    mov     rsi, QWORD PTR [rsi+8]
    mov     edx, 32
    mov     rdi, rsp                 ;OOOPS
    call    strncpy

    mov     rdi, rsp                 ;OOOPS
    xor     eax, eax
    call    printf
    
    xor     eax, eax
    add     rsp, 40
    ret

The code above is the compiled output of your sample program (once cleaned).

But, glibc on the SYS V x86-64 ABI doesn't synthesize argc itself (like the Windows' counterpart has to do, see GetCommandLine and similar), this value is passed as the first value on the stack when a program is created (see figure 3.9 of the ABI specifications).

Initial program stack

So you can reach it with printf by using a %d format that skips the first k - 1 arguments, that is with %k$d where k is the number to be found.

To find k you just have to find the offset between rsp when printf is called and the address of argc.
But since argc is at the bottom of the stack when the process is created, this equals to finding the offset between rsp at the call site for printf and the initial value of rsp.

So using gdb:

gdb --args format-string test
   b _start
   r
   i r rsp
     0x7fffffffdfa0   The initial value of RSP
   b printf
   c
   i r rsp
     0x7fffffffd9d8 The value AFTER printf is called. Add 8 to find it BEFORE the call
   q

Now 0x7fffffffdfa0 - (0x7fffffffd9d8 + 8) = 0x110

0x110 bytes are 34 arguments (0x110/8 = 0x22) and since the first four arguments are in the registers, we need to skip them too, adding 4. Finally, the count is one based and the difference inclusive so we need to add 2 to the count. The final value is, for my example environment, 34 + 4 + 2 = 40, leading to the command:

./format-string '%40$d'
Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • Hello, Margaret! Thank you so much for your answer and sorry for the delay on your comment! I was trying to replicate what you did with gdb, but when type "i r rsp" for the second time, I get the same address. With trial and error, I manage to find that the argument is '%44$d'. Why am I getting the same address? Maybe I am doing something wrong... – Miguel Santana Oct 18 '20 at 12:59
  • Thanks; Margaret. One last question. I've applied your logic and the value I get is 42, but it should be 44, as '%44$d' is the argument that gives the argv value. My values: 0x7fffffffdf10 - (0x7fffffffddd8 + 8) = 0x130 and 0x130/8 = 0x26 and 38 (decimal) + 4 = 42. – Miguel Santana Oct 18 '20 at 13:39
  • You are right. Somehow, I reported the wrong count. Two more must be added because the counting is one based (which account for one) and the difference is inclusive (which account for the other one). I'm editing, thanks – Margaret Bloom Oct 18 '20 at 14:40
  • Good idea to look beyond main's stack frame all the way back to `_start`. I was thinking the exploit was only going to be possible in a debug build (where the asm for `main` spills its register args into its own stack frame). Apparently gcc/glibc's `_start` doesn't `pop rdi` to get argc (because that would misalign RSP), so it's still there regardless of what compiler options you use. – Peter Cordes Oct 20 '20 at 09:08
  • @PeterCordes I would probably end up in the stack anyway. There are quite a few calls before `main`, so `argc` is not likely to stay in the registers. And it seems unlikely (or it's just sheer luck) that a `glibc` reserves a static location for it. – Margaret Bloom Oct 20 '20 at 13:59
  • 1
    Yes, it seems there is a 2nd copy on the stack somewhere, when single-stepping to __libc_start_main. (Run with several args, `x /40gx $rsp` and look for that num). I wondered if maybe _start would just pass a copy of the initial `rsp`, but it seems something must have created separate argc, argv, envp args, and passed them to a function that has to spill them to call other init functions before main. The first user-space instruction (in /lib64/ld-linux-x86-64.so.2's `_start`, not CRT entry point in a dynamically linked executable) *does* do `mov rdi, rsp` before a `call`, which is interesting – Peter Cordes Oct 20 '20 at 14:13
  • @PeterCordes I forgot about the dynamic linker but I think in the end it has to recreate (or reuse) the same initial stack because you can use a statically linked CRT and thus avoid the dynamic linker altogether and probably the CRT code is the same for both cases. – Margaret Bloom Oct 20 '20 at 14:29
  • Yes, once the dynamic linker is done, it has to jump to the `_start` entry point in the real executable with the initial process state as specified in the ABI. (Including RDX = a pointer to pass to atexit, if non-NULL; this is what that part of the ABI is for.) Now that you mention it, *this* is probably why it saves RSP on entry: so it can restore it before jumping, not for access to argc / argv. The dynamic linker doesn't leave anything above the stack that `_start` sees, but it does dirty some space below. – Peter Cordes Oct 20 '20 at 14:33
  • I mentioned ld.so because I wondered if its `mov rdi,rsp` was for access to argc / argv / envp. (Actually it might *also* be that; it probably does check for env vars like LD_PRELOAD.) I wondered if CRT might just pass a copy of the initial RSP to later functions, instead of passing 3 args separately that all have to be spilled, and then reloaded for the call to main. If CRT had been designed that way, there wouldn't be an extra copy of argc below the initial one. (I mostly noticed the ld.so `_start` because `gdb /bin/ls` / `starti` lands there, not the CRT _start I wanted to see...) – Peter Cordes Oct 20 '20 at 14:37
0

printf is using system v x86-64 bit ABI which state that all arguments to functions to be passed in registers rdi, rsi, rdx, rcx, r8, r9 then further values parameters if present to be passed onto stack in reverse order so in your case you will need to pass multiple %p (depending on how many data present already on the stack) and we use %p since we want to print data as 64-bit values. In short, passing multiple %p to printf will first view registers then will view parameters that are stored onto stack (read up from memory). so

%p%p%p%p%p%p%p%p%p  /* will print registers values first extra %p will start to read up from stack (feel free to add as you want but keep in mind it will result in segmentation fault eventually if reached a specific area in memory but not sure when)*/
KMG
  • 1,433
  • 1
  • 8
  • 19
  • Thanks @Khaled! That's a good starting point. Are you sure you want "%p"? With that, I only get points, which are memory addresses. How can I confirm if that memory address is the one from "argc" argument? – Miguel Santana Oct 18 '20 at 11:58
  • 1
    @MiguelSantana: In x86-64 System V, `%p` is basically a synonym for `%#lx`. With enough 64-bit integer->hex conversions, argc will be there somewhere *if* you built in anti-optimized debug mode. [Otherwise `main` wouldn't spill it from RDI to memory in the first place](https://stackoverflow.com/questions/53366394/why-does-clang-produce-inefficient-asm-with-o0-for-this-simple-floating-point). (Since it's unused by the program, no need to save it.) Anyway, argc may only be the high or low half of a 64-bit integer with a non-zero other half. If it's the low half, a `%x` would ignore high. – Peter Cordes Oct 20 '20 at 08:55
  • @MiguelSantana: This answer left out the fact that a debug build of main will spill its args to the stack. Anyway, you can find out which stack position holds argc by looking at the compiler-generated asm for your build of it. – Peter Cordes Oct 20 '20 at 08:56
-1

use %d for integers and %s for strings

#include <stdio.h>

int main (int argc, char **argv) 
{
    char buffer[32] = {0};
    strncpy(buffer, argv[1], sizeof(buffer));
    printf("argc = %d and argv[1] = %s\n", argc, buffer);
    return 0;
}
IrAM
  • 1,720
  • 5
  • 18
  • Hello, IrAM. Thanks for your answer. But that's not what I was looking for. I need to be giving what is going inside the printf, through an argv. And that comes from the terminal execution. – Miguel Santana Oct 18 '20 at 11:21
  • 1
    do you mean to find out which register holds the value of `argc` ? – IrAM Oct 18 '20 at 11:31
  • Yes, I want to be able to print that by giving the format specifier in the command line. Check the question again for more details – Miguel Santana Oct 18 '20 at 11:32
  • Yes, for example the register of the stack. – Miguel Santana Oct 18 '20 at 11:34
  • tried `%edi` ? , there is a post here : [https://stackoverflow.com/questions/4196201/where-are-c-c-main-functions-parameters] – IrAM Oct 18 '20 at 11:42
  • That looks good, but where is the %edi located in the stack to be printed with format specifiers? – Miguel Santana Oct 18 '20 at 11:46
  • @IrAM: This is an exploit question: you can't modify the source code to simply print `argc`. The point is to look at the asm and pass in a format string in `argv[1]` that will get the program to read the memory where main happened to dump `argc` (which *only* happens in a debug build; otherwise `argc` would be optimized away and lost when putting `&buffer` into RDI for the call to `strncpy`, or to printf if strncpy was optimized away, too.) – Peter Cordes Oct 20 '20 at 09:01
  • To be fair, the question confusingly says "my current code is ...", as if modifying it was allowed or even desired. – Peter Cordes Oct 20 '20 at 09:05
  • @PeterCordes, I am not one who asked this question – IrAM Oct 21 '20 at 02:29
  • I know, you're the one who posted this answer that misinterprets the question. I was trying to explain why this answer *doesn't* answer the question, and what it's actually about. (Or just look at Margaret Bloom's answer.) – Peter Cordes Oct 21 '20 at 02:33