0

I am learning about buffer overrun with this source code:

#include <stdio.h>
int main()
{
    char buf[16];
    gets(buf);
    printf("buf @ %8p\n", (void*)&buf);
    return 0;
}

I try to write Null character ('\0') to buf variable.

First, in gdb, I set the breakpoint at line 6, after the gets() function and run it with r <<< $(python -c 'print "\0"*11 + "AAAA"')

When I explore the stack, I realize it only write "AAAA" to buf. What happens?

(gdb) x/16xw &buf
0xffffcf80: 0x41414141  0xffffd000  0xffffd04c  0x080484a1
0xffffcf90: 0xf7fb43dc  0xffffcfb0  0x00000000  0xf7e1a637
0xffffcfa0: 0xf7fb4000  0xf7fb4000  0x00000000  0xf7e1a637
0xffffcfb0: 0x00000001  0xffffd044  0xffffd04c  0x00000000

But, when I run the program with r <<< $(python -c 'print "\1"*11 + "AAAA"'), the buf will be:

(gdb) x/16xw &buf
0xffffcf80: 0x01010101  0x01010101  0x41010101  0x00414141
0xffffcf90: 0xf7fb43dc  0xffffcfb0  0x00000000  0xf7e1a637
0xffffcfa0: 0xf7fb4000  0xf7fb4000  0x00000000  0xf7e1a637
0xffffcfb0: 0x00000001  0xffffd044  0xffffd04c  0x00000000

So the gets() function will not receive the Null character or the stdin will ignore it ?

P/S: I built it with gcc -m32 -fno-stack-protector -g stack.c -o stack on gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609.


Update: After some suggestions, I try this:

#include <stdio.h>
int main()
{
    char buf[16];
    gets(buf);
    printf("buf @ %8p\n", (void*)&buf);
    for (int i = 0; i < 16; ++i) // this is for loop all the buf
    {
        printf("%02x ", buf[i]);
    }
    return 0;
}

It works with '\0'

$ gcc -g j_stack.c -o j_stack
$ python -c 'print "AAAA" + "\0"*6 + "AAAA"'| ./j_stack 
buf @ 0xffffcfbc
41 41 41 41 00 00 00 00 00 00 41 41 41 41 00 ffffffff

But how do I provide input which contains '\0' to buf in gdb program

lzutao
  • 409
  • 5
  • 13
  • Minor tip: it would be easier to read this output if you'd initialised `buf` to all 0xFFs or something like that – Lightness Races in Orbit Dec 22 '16 at 15:55
  • The `gets` and `fgets` functions to not treat `'\0'` specially. They can read null characters just fine. I used your program to read the string `"ab\0c\n"` and it worked as I expected. – Steve Summit Dec 22 '16 at 15:57
  • I think this is due to your use of "here strings" to provide input (the `<<<`). It's a very convoluted way of passing input to your program, don't you think? Take gdb out of the equation and just use a bog-standard pipe from shell; you should then see it working. Let us know if this is true and therefore answer-worthy. – Lightness Races in Orbit Dec 22 '16 at 16:04
  • Sorry about that... My memory was substituting `\0` for `\n`. Answer deleted. – David Hoelzer Dec 22 '16 at 16:05
  • Did you try my suggestion? – Lightness Races in Orbit Dec 22 '16 at 16:21
  • 1
    Run: `od -c <<< $(python -c 'print "\0"*11 + "AAAA"')`; the output I get is four A's and a newline. As diagnosed, the problem is in the Bash heredoc processing, not your program or `gets()`. Of course, except for testing overflows, you should know that [`gets()` is too dangerous to be used — ever!](http://stackoverflow.com/questions/1694036/). (You could still use `fgets()`; you'd simply write `fgets(buf, 4096, stdin);`, lying through your teeth about the size of `buf`.) – Jonathan Leffler Dec 22 '16 at 18:21

2 Answers2

3

No, it doesn't.

This behaviour has nothing to do with gets(), or with Python strings; it's due to the way you're providing input to your program, using a subshell and the Bash "herestring" syntax (which performs some manipulations on whatever you give it, apparently including dropping null bytes):

# python -c 'print "\0"*11 + "AAAA"' | wc -c
16
# python -c 'print "\0"*11 + "AAAA"' | hexdump
0000000 0000 0000 0000 0000 0000 4100 4141 0a41
0000010

# cat <<< $(python -c 'print "\0"*11 + "AAAA"') | wc -c
5
# hexdump <<< $(python -c 'print "\0"*11 + "AAAA"')
0000000 4141 4141 000a
0000005

# echo $(python -c 'print "\0"*11 + "AAAA"') | wc -c
5

If you run your program with a simple pipe, you should see the results you expect:

python -c 'print "\0"*11 + "AAAA"' | ./myProgram
Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
2

No, gets does not ignore '\0'.

I changed your program to include

for(i = 0; i < 16; i++) printf("%02x", buf[i]);
printf("\n");

after calling gets. I ran the program on the input

abc\n

and saw

61626300000000000000000000000000

as I expected. I then ran the program on the input

ab\0c\n

and saw

61620063000000000000000000000000

which was also what I expected.


P.S. I'm not sure why you saw the behavior you did, but I confess I'm not sure what you're doing with <<< and those python fragments. Me, I used

echo abc | a.out

and

echo 616200630a | unhex | a.out

where unhex is a little program I have in my bin directory for, well, doing the obvious.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • It would be more helpful if you mentioned which compiler and OS you used. The behavior noted in the question might be specific to one setup. – Mark Ransom Dec 22 '16 at 16:02
  • I used clang under MacOS. But I don't believe the behavior is system-specific. `gets` and `fgets` are not documented as treating `'\0'` specially, and I have never encountered an implementation that did. – Steve Summit Dec 22 '16 at 16:05
  • I don't think it *should* be system specific, but I don't see anything obvious in the question that would invalidate their observation. – Mark Ransom Dec 22 '16 at 16:32