4

I was doing a little bit of research about a topic when I came across this situation. Assume the following C code:

#include <stdio.h>
int main() {
char name[1];
scanf("%s",name);
printf("Hi %s",name);
return 0;
}

I've compiled with -fno-stack-protector and tested it with input longer than 1, like John, & to my surprise, It works!
Shouldn't it throw a segmentation fault when the input is longer than 1?
Eventually it broke with Alexander as input (9) but it works with anything less than 9.
Why is it working with inputs longer than the name array length?
P.S : I'm using Ubuntu(64-bit), gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) & CLion as IDE.

Sam
  • 489
  • 5
  • 22

2 Answers2

7

This is undefined behavior. Your program has a buffer overrun, because it allocates exactly one character, which is sufficient for storing an empty null-terminated string.

However, there is memory adjacent to your buffer that has not been allocated to your program. scanf places your input into that memory, because it does not know how long is your string buffer. This is a big danger and a source of countless hacker attacks, when a pre-determined sequence of bytes is placed into your string, in hopes to override some vital elements, and eventually gain control.

That is why using %s without specifying the size is dangerous. You need to always add a proper size limit to %s, otherwise your program is in danger of buffer overrun.

char name[120];
scanf("%119s",name);

This program is safe, because even if a malicious user types more than 120 characters, scanf would ignore everything past 119-th character, as specified in %119s format.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • +1. Basically, C doesn't care if you want to access an array's thousandth element even though you only initialized one. It causes undefined behavior indeed, depending on the content. – Link Aug 01 '15 at 10:01
  • Amazing answer. Thanks. – Sam Aug 01 '15 at 10:13
  • So, If I understood it well, It just writes the rest of the buffer to adjacent memory? Is there a condition like (If the adjacent memory is free) ? – Sam Aug 01 '15 at 10:16
  • @Sam There is no condition to check if adjacent memory is free. In fact, there is no way to tell in C if that place is free or is allocated to some other data structure. You could experiment with UB, and see what happens to adjacent variables when you overrun the buffer. Chances are, your excess data would end up in variables following `str`. – Sergey Kalinichenko Aug 01 '15 at 10:21
  • @dasblinkenlight Another point is , why it always break on input with the length more than 8 ? Shouldn't it vary on each execution of the binary due to changes in memory map and adjacent memory? – Sam Aug 01 '15 at 10:23
  • @Sam This is somewhat complicated: each run gets its own memory map in terms of physical allocation, but from the point of view of your program its memory space stays the same. That is why consecutive runs may looks the same. Some of unallocated space may be used by `printf`'s local variables, too, because `str` is on the stack. – Sergey Kalinichenko Aug 01 '15 at 10:34
  • @dasblinkenlight You're very knowledgeable in this area. Thanks for your enlightening answers, Would you please guide me on how & where should I look to learn in depth & details, just like you do? It would be appreciated. – Sam Aug 01 '15 at 10:51
  • @Sam A great deal of in-depth details comes from using a combination of C code and assembly programming, which I did some 25 years ago. This is much easier to try on small-scale systems: I used 68HC11 and 68000-based embedded designs. I learned their assembly languages, and watched how cross-compilers translated my C code into assembly. Lots of C internals become very clear after doing this for a couple of years. – Sergey Kalinichenko Aug 01 '15 at 11:02
  • @dasblinkenlight Thank you so much for your time, You've shed so much light on the way, more than my CS instructor ever did. Best. – Sam Aug 01 '15 at 11:05
1

The size and type of the variable where you store the input has nothing to do with scanf.

scanf is only passed an address (pointer) where to deposit the input it gets from the user.

Clever compilers now warn you if the format string passed to scanf does not match the type of the parameters, but in principle you could even declare name as an integer:

int name;

and it would hold the input string quite well, up to three characters (the fourth is for the End Of String, i.e. zero), assuming the size of int is 32 bits , i.e. 4 bytes

The fact that it works is pure bad luck, since the input data, when stored by scanf, runs past the end of the allocated buffer for it (name).

Note: allocating only one character for a string would never work, even for input strings of one character only. You always need to account for the EOS that is used to terminate them. So name should be declared as char name[2]; at the very least.

Pynchia
  • 10,996
  • 5
  • 34
  • 43
  • Thanks for your answer. GCC has warned me about type mismatch, as you have stated. – Sam Aug 01 '15 at 10:20