-2

The following piece of c code appears to take up to 8 characters in the input and then give segFaults for larger inputs.

int main()
{
  char a[1];
  printf("Input:\n");
  scanf("%s",a);
  printf("%s\n",a);
  printf("%d\n",strlen(a));
  printf("%d\n",sizeof(a));

  return 0;
}

Outputs

Case 1:

Input:
aaaaaaaa
aaaaaaaa
8
1

Case 2:

Input:
aaaaaaaaa
aaaaaaaaa
9
1
[1]    15688 segmentation fault (core dumped) 

My machine is a 64-bit Intel Linux

Compiler is gcc version 6.1.1 20160802 (GCC)

Sequence of commands performed:

gcc -c -g test.c - creates output file test.o

gcc -o test test.o

./test

I am a beginner at c programming. Any insight is much appreciated.

On the surface I would expect it to give some error or warning on input of 2 or more characters.

Also, objdump -d test gave sub $0x10,%rsp which implies that the stack stores 16 bytes for the main(). So maybe it should be taking 16 characters and not 8 as input.

stWrong
  • 334
  • 2
  • 14
  • 2
    What do you expect to happen? – Amit Sep 08 '16 at 05:42
  • 1
    This is undefined behavior. Anything could happen. In your case you are overriding `main` stack frame boundary. – Ari0nhh Sep 08 '16 at 05:42
  • @Amit I expect a segFault on input of 2 or more characters as each character input should be 1 byte in size – stWrong Sep 08 '16 at 05:43
  • Whole lot of things to cause undefined behaviour . – ameyCU Sep 08 '16 at 05:43
  • @stWrong When it is UB you cannot expect result always to be a segmentation fault. It could work it could not . – ameyCU Sep 08 '16 at 05:44
  • The stack allocates `sub $0x10,%rsp` or 16 bytes to the program - I think – stWrong Sep 08 '16 at 05:45
  • @stWrong Accessing invalid array indices doesn't necessarily cause a segmentation fault. – melpomene Sep 08 '16 at 05:45
  • @stWrong Undefined behavior does not equal segfault. And you would be overflowing the buffer with even a single-char input, since `scanf` stores the ending '\0' too for `%s`. – dxiv Sep 08 '16 at 05:45
  • @dxiv But can be avoided if treated as character. – ameyCU Sep 08 '16 at 05:46
  • @M.SChaudhari Nope, it's not related in the least. – dxiv Sep 08 '16 at 05:46
  • Editing the question: I wish to know why it does work for anything more than 1 character. – stWrong Sep 08 '16 at 05:47
  • 3
    If you could predict with certainty what would be the result of a UB code, it would not be UB. Since you *know* this is UB, this question is pointless. – Amit Sep 08 '16 at 05:47
  • @ameyCU Right, but the OP uses `%s` not `%c`. – dxiv Sep 08 '16 at 05:47
  • On my machine it fails when more than one character is read – smac89 Sep 08 '16 at 05:50
  • @Amit I do not know if it an UB code. Are there components other than stack that need to be checked for this behavior? – stWrong Sep 08 '16 at 05:51
  • @smac89 how much stack space is allocated? – stWrong Sep 08 '16 at 05:52
  • @stWrong `I would expect it to give some error or warning on input of 2 or more characters` The compiler doesn't know how many characters will be entered at runtime, so it can *not* possibly issue any warning. It is the responsibility of *your* code to allocate a large enough buffer to hold whatever input you expect. – dxiv Sep 08 '16 at 05:54
  • I don't think stack space has anything to do with it's success. The fact remains that this is undefined behaviour as has been observed by the result I get. Anyways, the output I get for that is `sub $0x8,%rsp` – smac89 Sep 08 '16 at 05:57
  • @dxiv by error I mean the code should break with something like a segFault or Bus error or Invalid instruction. (as extra characters will rewrite entries in the program stack since c is not memory safe) – stWrong Sep 08 '16 at 05:59
  • @stWrong The `C` language itself has no notion of `a segFault or Bus error or Invalid instruction`. The code you posted is UB = "undefined behavior" and what you see is one among infinitely many possible manifestations of UB. – dxiv Sep 08 '16 at 06:03

1 Answers1

3

When you have an array declared as:

char a[1];

it can hold only one character for a well behaving program. If you put more than one character into the array, the program is subject to undefined behavior. You cannot make sense of how such a program behaves when its behavior is, by definition, undefined.

Don't do it.
It's pointless to make sense of the behavior of such a program.

R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • should the question be deleted? – stWrong Sep 08 '16 at 05:55
  • No, that's not necessary. This is part of learning the language -- creating buggy code and figuring out why it doesn't behave the way you expect it to. – R Sahu Sep 08 '16 at 05:59
  • @stWrong the system autodeletes questions with negative score and duplicate link, after half an hour or so. Don't take it personally, it just means that the site already has this info so it doesn't need it repeated. – M.M Sep 08 '16 at 06:00
  • @M.M - are you [*sure*](http://stackoverflow.com/questions/280801/ruby-scripting-on-windows)? Is there any evidence to what you claim? – Amit Sep 08 '16 at 06:40
  • @Amit this policy started later than 2008. Look on meta for discussion. – M.M Sep 08 '16 at 07:24