2

I am both confused and excited about this behavior I am getting from my C code. I do not understand how on earth this is happening! Before anything further, let's see the code-

#include <stdio.h>

int main(){

    char string[2];

    printf("Enter your string here: ");
    gets(string);
    printf("%s \n", string);

    return 0;
}

It is very clear that- nothing is special here. Actually, this was an assignment of my Computer, Data & Network Security course, where I was supposed to demonstrate BufferOverflow.

It works just fine till 13 characters; 15 or more charters causes desired BufferOverflow. The disaster happens when I enter exactly 14 characters: the code starts behaving like a loop! It's like the main function is being called again and again-

screen shot of console

I am using CodeBlocks-16.01 and GNU GCC Compiler. I have also executed the code on TutorialsPoint but did not get this problem there.

Minhas Kamal
  • 20,752
  • 7
  • 62
  • 64
  • 9
    Undefined behavior is undefined. – Retired Ninja Jul 25 '16 at 04:53
  • 1
    @MinhasKamal your program violates the rules of the C language. So you should not expect any particular behaviour (such as looping or not-looping) – M.M Jul 25 '16 at 04:58
  • 2
    You're writing data into memory that you shouldn't. What happens when you do that is undefined. – Retired Ninja Jul 25 '16 at 04:59
  • 3
    https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior – user3386109 Jul 25 '16 at 05:00
  • 4
    You can only validly enter a single character plus newline; anything more means that `gets()` is scribbling outside the array, and anything can happen. See also [Why `gets()` is too dangerous to be used, ever?](http://stackoverflow.com/questions/1694036/why-is-the-gets-function-dangerous-why-should-it-not-be-used) – Jonathan Leffler Jul 25 '16 at 05:02
  • 1
    The looping behaviour is a perfectly good demonstration of buffer overflow. It shows that when undefined behaviour is triggered, a program can do absolutely anything. Sometimes it can work as expected, sometimes it can crash, sometimes it can start looping indefinitely, and sometimes it can play Prelude and Fugue in C minor arranged for eight beer bottles and a chainsaw. Your program already can do three things out of these four, and I'm pretty sure it can do the fourth thing too if the stars align just right. – n. m. could be an AI Jul 25 '16 at 05:58

1 Answers1

7

Your code generates a buffer overflow--a real one. Going past the end of string can overwrite the return address [on the stack] that main should return to when done.

If it is chosen correctly, it can loop back to main or jump just about anywhere in memory. What it actually does depends upon the compiler, the linker, the loader, the address the program was loaded at.

And, the value of the string entered (i.e.) some strings will crash, others might loop, some might produce goofy results, but not loop. One string might do X behavior in a given environment and Y behavior in another. A different string might reverse these results.

What you really want to do is demonstrate (i.e. simulate) buffer overflow, without doing anything that will crash your program.

Here is a safe way to do this:

#include <stdio.h>

#define SEED     0xFF                   // sentinel value

// NOTE: using a struct guarantees that over will appear directly after string
// (i.e.) over is higher in memory than string
struct buffer {
    char string[4];                     // buffer that can overflow
    unsigned char over[80];             // safe place for the overflow
};

int
main(void)
{
    struct buffer buf;
    int idx;
    int over;

    // prefill the "overflow detection buffer" with a sentinel value (e.g. one
    // that can't be input via fgets [under normal circumstances])
    for (idx = 0;  idx < sizeof(buf.over);  ++idx)
        buf.over[idx] = SEED;

    printf("Enter your string here: ");
    fflush(stdout);

    // NOTE: this fgets will never _really_ cause any harm -- the "10" slop
    // factor guarantees this
    fgets(buf.string,sizeof(buf) - 10,stdin);

    // overflow is anything that ran past string into over
    over = 0;
    for (idx = 0;  idx < sizeof(buf.over);  ++idx) {
        if (buf.over[idx] != SEED) {
            over = 1;
            break;
        }
    }

    if (over)
        printf("buffer overflowed\n");
    else
        printf("buffer did not overflow\n");

    return 0;
}

UPDATE:

you should specifically admonish against the use of gets

Ordinarily, I would have. Because of the special nature of this question, I was on the fence about this.

(even though he wants a buffer overflow, you should add why he may very well get one he doesn't want using the removed gets function)

IMO, this was implied by the use of fgets in my example code, but may not have been specifically inferred. So, fair enough ...

In my example code, using gets(buf.string) instead of the fgets could/would produce the same [desired] effect. However, this would still be unsafe because there is still no limit on the length read. It could run past the total struct length sizeof(string) + sizeof(over) and produce a real buffer overflow, just as before.

Since you were trying to cause a buffer overflow, it's easier to code with gets, but you get the undesired behavior.

[As others have pointed out] gets is deprecated for that very reason. If you just wanted a normal usage, replace gets(string) with fgets(string,sizeof(string),stdin) So, never use gets and always use fgets.

Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • 2
    Good answer, but you should specifically admonish against the use of `gets` (even though he wants a buffer overflow, you should add why he may very well get one he doesn't want using the removed `gets` function). (I see Lefler covered it in his comment -- never mind) – David C. Rankin Jul 25 '16 at 06:28
  • @DavidC.Rankin I had debated that point when writing my original answer. But, for the sake of completeness, I've updated to include such a caution. – Craig Estey Jul 25 '16 at 18:46
  • Good deal, you can never impress enough on those attempting to use `gets` (even if it is their clueless professors or TAs suggesting the use) to NEVER use it. I know you can't include everything, but there are some issues it is always worth picking up the proverbial bat and knocking them over the head with it `;)` – David C. Rankin Jul 26 '16 at 01:39
  • 1
    @DavidC.Rankin We didn't get them on this question, but you've been around the C tag questions long enough to know there are "the usual suspects" [_neither_ you nor Jonathan as you both give explanations] that are the "_extreme_ language lawyers" and quote the ISO spec relentlessly. They _will_ point such things out--usually without any mercy to newbies. One of the catchphrases is UB [undefined behavior]. Due to the semi-abusive way it's been used on SO, I've come to dislike it. That's why I said "crash the program" instead [which is what the old folks say]. – Craig Estey Jul 26 '16 at 02:26
  • @DavidC.Rankin Under linux and `gcc`, `gets` is undefined, even with `stdio.h`. That is, its function declaration has been removed from `stdio.h` (i.e. it gets `-Wimplicit-function-definition` warning, even _without_ any `-W` options on the `cc` command). And, during linking, it produces `warning: the 'gets' function is dangerous and should not be used.` – Craig Estey Jul 26 '16 at 02:39
  • Yes, I particularly like that warning `:)` – David C. Rankin Jul 26 '16 at 03:37
  • @DavidC.Rankin If they added "Use `fgets` instead" to the warning it'd be perfect. Or, "See `man 3 gets`" as the `glibc` manpage gives a succinct explanation of the problem and solution. I don't think I've ever used `gets` [or `getchar`], because they didn't have an input stream argument. If I _had_ used `getchar` and then, later, decided to do, `fin = fopen(...)`, I'd have to edit _all_ the code that used the `getchar` – Craig Estey Jul 26 '16 at 04:05