-1

I've seen from a Numberphile video (https://youtu.be/1S0aBV-Waeo) a way to run a buffer overflow, and I wanted to try it out. I have written a piece of code, which is identical to the one shown in the video except for the size of "buffer", but, if I give in input a string bigger than the size of "buffer", I am not getting a segmentation fault, as it was shown in the video; can someone explain why?

#include <stdio.h>
#include <string.h>

int main(int argc, char** argv){
    char buffer[50];
    strcpy(buffer, argv[1]);
    
    return 0;
}

Edit: By the way, as I've seen in the comments that this is a determinating thing, I am using th GCC compiler.

  • 3
    Undefined behavior may cause a segmentation fault. Or may not. The behavior of undefined behavior is *undefined*. It may appear to work as expected, if you are unlucky. – Eljay Jun 06 '22 at 16:10
  • 1
    Welcome to undefined behavior land, where all outcomes are correct. – NathanOliver Jun 06 '22 at 16:10
  • @user4581301 - no question there are dodgy resources on the internet, but Computerphile is a reliable and useful resource. – Steve Friedl Jun 06 '22 at 16:52
  • 1
    Saying that if you overflow a buffer, you *will* get a segmentation fault is just like saying, "If you jaywalk through the intersection when the light is red, you *will* get hit by a car." You might get hit by a car, or you might get across the intersection safely, or something else might happen. – Steve Summit Jun 06 '22 at 16:57
  • 1
    Undefined behavior can be tricky to think about at first. Although it talks about a different issue, you might find [this question](https://stackoverflow.com/questions/37087286/c-program-crashes-when-adding-an-extra-int) and its answer useful. – Steve Summit Jun 06 '22 at 17:01

2 Answers2

3

I am not getting a segmentation fault, as it was shown in the video; can someone explain why?

The program has undefined behavior as you're inputting a string bigger than the size of buffer and from strcpy documentation:

To avoid overflows, the size of the array pointed by destination shall be long enough to contain the same C string as source (including the terminating null character), and should not overlap in memory with source.

(emphasis mine)


Undefined behavior means anything1 can happen including but not limited to the program giving your expected output. But never rely(or make conclusions based) on the output of a program that has undefined behavior. The program may just crash.

So the output that you're seeing(maybe seeing) is a result of undefined behavior. And as I said don't rely on the output of a program that has UB. The program may just crash.

So the first step to make the program correct would be to remove UB. Then and only then you can start reasoning about the output of the program.


1For a more technically accurate definition of undefined behavior see this where it is mentioned that: there are no restrictions on the behavior of the program.

Gerhard
  • 6,850
  • 8
  • 51
  • 81
Jason
  • 36,170
  • 5
  • 26
  • 60
  • Recommend sourcing the second quote. – user4581301 Jun 06 '22 at 16:15
  • @user4581301 It is mine. – Jason Jun 06 '22 at 16:15
  • Re “Then and only then you can start reasoning about the output of the program”: This is false. When the C standard says something is “undefined,” that only means that the C standard does not impose any requirements. It does not, and cannot, prevent other things from imposing requirements or have causal effects. Compiler design affect the results. System design affects the results. Laws affect the results (consumer products should never start an undesired fire even if a program has “undefined behavior”). Physics affects the results… – Eric Postpischil Jun 06 '22 at 16:27
  • … Further, programmers and others cannot afford to ignore these effects. Malicious entities, from immature students to scammers to governments, will seek to use so-called “undefined behavior.” They will study it, reason about the causes and what effects can be obtained, and use it to annoy, steal from, or even kill people. Teaching students that “undefined behavior” cannot be reasoned about is malpractice and is harmful to society. Responsible programmers must learn about symptoms of using things with undefined behavior, how to diagnose in the face of them, and how to mitigate consequences. – Eric Postpischil Jun 06 '22 at 16:28
  • @EricPostpischil Don't forget this is a C++ question and not specific to a particular compiler. I never said that the C/C++ standard prevent other things from imposing requirements. As in this context, as far as the C++ standard is concerned anything can indeed happen. This means the compilers are not required to produce a given effect. I can even cite standard go to C++ books that says that *"the standard indicates that doing this leads to “undefined behavior,” which allows anything to happen"* – Jason Jun 06 '22 at 16:38
  • @EricPostpischil **You're deliberately trying to take my quote out of context.** And if you do that you're on your own. The OP is expecting the program to behave a certain way which it will not as it has UB and as far as C++ is concerned you cannot expect the program to behave a certain way. **The scope of this question is limited to the tags it has been tagged with. And there is no tag related to compiler implementation.** – Jason Jun 06 '22 at 16:39
  • @AnoopRana: No, I am not taking it out of context. OP’s question is specifically in the context of studying buffer overflows, not learning “regular” C programming. Telling them “undefined behavior” cannot be reasoned about is contrary to the intent of their question. They are seeking to learn specifically about what does happen when a buffer overflow occurs, and that is a question that can be answered and that is important. – Eric Postpischil Jun 06 '22 at 16:42
  • @EricPostpischil No, the scope of this question is limited to the tags it has been tagged with. And there is no tag related to compiler implementation. Anyway my point is that i am answering the question from C++ point of view. And you're trying to apply that to other areas which i never claimed will work. – Jason Jun 06 '22 at 16:44
  • @AnoopRana: Re “No, the scope of this question is limited to the tags it has been tagged with.”: There is no such rule. Tags set context, not limits. Some asking about implementing a red-black tree tagged with C++ should not be answered that there is no red-black tree specified in the C++ standard. They would be asking about implementing a red-black tree using C++, not limited to only what C++ provides. – Eric Postpischil Jun 06 '22 at 16:45
2

If I am correct that you wanted to understand what happened in your specific case, you could improve your question by providing the version of the compiler, the arguments you passed to the compiler, the arguments you passed to your program, and the output of your program. That way, you would have a Minimal Reproducible Example and we would understand better what your specific case is.

For example, I use GCC 9.4.0:

$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Here is what happened when I compiled without optimization and passed a string with 55 characters as an argument to the program:

$ gcc -o bufferoverflow bufferoverflow.c
$ ./bufferoverflow 1234567890123456789012345678901234567890123456789012345
$

So, even though the number of bytes copied into the buffer, 56 including the terminator, should cause a write past the end of the buffer, the program ran without any error that is visible by simply looking at standard error or standard output.

Here is what happened when I ran the same executable but passed a 57 character string in the command line.

$ ./bufferoverflow 123456789012345678901234567890123456789012345678901234567
*** stack smashing detected ***: terminated
Aborted (core dumped)
$

One way to understand what happened in the case with the 55 character string is to run it again using using gdb, which can be started as shown:

$ gdb bufferoverflow
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bufferoverflow...
(No debugging symbols found in bufferoverflow)
(gdb)

Now lets see why passing a 55 character string as the first argument didn't result in an obvious failure:

(gdb) break main
Breakpoint 1 at 0x1169
(gdb) r 1234567890123456789012345678901234567890123456789012345
Starting program: /home/tim/bufferoverflow 1234567890123456789012345678901234567890123456789012345

Breakpoint 1, 0x0000555555555169 in main ()
(gdb) x/23i main
=> 0x555555555169 <main>:   endbr64 
   0x55555555516d <main+4>: push   %rbp
   0x55555555516e <main+5>: mov    %rsp,%rbp
   0x555555555171 <main+8>: sub    $0x50,%rsp
   0x555555555175 <main+12>:    mov    %edi,-0x44(%rbp)
   0x555555555178 <main+15>:    mov    %rsi,-0x50(%rbp)
   0x55555555517c <main+19>:    mov    %fs:0x28,%rax
   0x555555555185 <main+28>:    mov    %rax,-0x8(%rbp)
   0x555555555189 <main+32>:    xor    %eax,%eax
   0x55555555518b <main+34>:    mov    -0x50(%rbp),%rax
   0x55555555518f <main+38>:    add    $0x8,%rax
   0x555555555193 <main+42>:    mov    (%rax),%rdx
   0x555555555196 <main+45>:    lea    -0x40(%rbp),%rax
   0x55555555519a <main+49>:    mov    %rdx,%rsi
   0x55555555519d <main+52>:    mov    %rax,%rdi
   0x5555555551a0 <main+55>:    callq  0x555555555060 <strcpy@plt>
   0x5555555551a5 <main+60>:    mov    $0x0,%eax
   0x5555555551aa <main+65>:    mov    -0x8(%rbp),%rcx
   0x5555555551ae <main+69>:    xor    %fs:0x28,%rcx
   0x5555555551b7 <main+78>:    je     0x5555555551be <main+85>
   0x5555555551b9 <main+80>:    callq  0x555555555070 <__stack_chk_fail@plt>
   0x5555555551be <main+85>:    leaveq 
   0x5555555551bf <main+86>:    retq   

From the above disassembly we can see that main+60 is just after the call to strcpy. We can also see, by looking at main+45 and main+52 that the buffer is at %rbp-0x40. We can continue to that point and look at what happened to the buffer:

(gdb) b *(main+60)
Breakpoint 2 at 0x5555555551a5
(gdb) c
Continuing.

Breakpoint 2, 0x00005555555551a5 in main ()
(gdb) x/56bx $rbp-0x40
0x7fffffffdf90: 0x31    0x32    0x33    0x34    0x35    0x36    0x37    0x38
0x7fffffffdf98: 0x39    0x30    0x31    0x32    0x33    0x34    0x35    0x36
0x7fffffffdfa0: 0x37    0x38    0x39    0x30    0x31    0x32    0x33    0x34
0x7fffffffdfa8: 0x35    0x36    0x37    0x38    0x39    0x30    0x31    0x32
0x7fffffffdfb0: 0x33    0x34    0x35    0x36    0x37    0x38    0x39    0x30
0x7fffffffdfb8: 0x31    0x32    0x33    0x34    0x35    0x36    0x37    0x38
0x7fffffffdfc0: 0x39    0x30    0x31    0x32    0x33    0x34    0x35    0x00

So we can see that, in spite of the fact that when we ran with this string earlier without gdb we didn't notice any obvious error, in fact the buffer overflow did occur. We simply didn't notice that it had. To understand why we didn't notice, one only has to look at the disassembly to see that the next used address on the stack is at %rbp-8 which is 56 bytes after %rbp-0x40. So the overflow went onto memory that was not in use.

The same disassembly shows why we get the stack smashing detected message when we run the program with the 57 character string. In that case, we clobber part of the 8-byte value at %rbp-8 which is used (at main+19, main+28, main+65, main+69 and main+78) as a check for whether the stack got corrupted during the call to main. So the reason we see that particular error with that particular input is that the 8-byte value at %rbp-8 was the only part of the stack that we clobbered that was actually used after we clobbered it and the message in question was as a result of noticing that those 8 bytes had changed.

Even if you did not compile your program exactly the way I did, and even if you did not use exactly the same input, I hope I have given you some solid ideas about how to understand the behavior in your case.

Tim Boddy
  • 1,019
  • 7
  • 13