I realized str2 can store more characters than its size?
No. What's happening is that excess characters are being written past the end of one array, and that's overwriting the contents of the other array (or other objects). C doesn't mandate bounds checking on array accesses - if you write past the end of an array, you won't get an "IndexOutOfBounds" exception or anything like that.
Based on your output, here's what's happening - str2
is allocated at a lower address than str1
, like so (address values are for illustration only):
+---+
0x1000 str2: | | str2[0]
+---+
0x1001 | | str2[1]
+---+
0x1002 | | str2[2]
+---+
...
+---+
0x1013 | | str2[19]
+---+
0x1014 str1: | | str1[0]
+---+
0x1015 | | str1[1]
+---+
0x1016 | | str1[2]
+---+
...
+---+
0x1027 | | str1[19]
+---+
So the first thing you do is
gets( str1 );
and enter the string "Why can str2 store more characters than str1?"
, which is 45 characters long. Unfortunately, gets
only receives the starting address of the buffer - it has no way of knowing how long the buffer is. So it happily stores the "ore characters than str1?"
portion of the string to the memory immediately following the end of str1
:
+---+
0x1000 str2: | | str2[0]
+---+
0x1001 | | str2[1]
+---+
0x1002 | | str2[2]
+---+
...
+---+
0x1013 | | str2[19]
+---+
0x1014 str1: |'W'| str1[0]
+---+
0x1015 |'h'| str1[1]
+---+
0x1016 |'y'| str1[2]
+---+
...
+---+
0x1027 |'m'| str1[19]
+---+
0x1028 |'o'| ???
+---+
0x1029 |'r'| ???
+---+
0x102a |'e'| ???
+---+
...
+---+
0x103f |'1'| ???
+---+
0x1040 |'?'| ???
+---+
0x1041 | 0 | ???
+---+
gets
also writes a 0 terminator to mark the end of the string.
The next thing you do is call strcpy
to copy the contents of str1
to str2
. Like gets
, strcpy
only gets the starting addresses of the source and target buffers - it doesn't know how long either buffer is. It relies on the presence of the 0 terminator in the source string to tell it when to stop copying. Thus, the first 20 characters of str1
get copied to str2
, and the remaining characters "spill" back over into str1
, overwriting what was there originally. After the strcpy
call, you get the following:
+---+
0x1000 str2: |'W'| str2[0]
+---+
0x1001 |'h'| str2[1]
+---+
0x1002 |'y'| str2[2]
+---+
...
+---+
0x1013 |' '| str2[19]
+---+
0x1014 str1: |'m'| str1[0]
+---+
0x1015 |'o'| str1[1]
+---+
0x1016 |'r'| str1[2]
+---+
0x1017 |'e'| str1[3]
+---+
...
+---+
0x1027 |' '| str1[19]
+---+
0x1028 |'s'| ???
+---+
0x1029 |'t'| ???
+---+
0x102a |'r'| ???
+---+
0x102b |'1'| ???
+---+
0x102c |'?'| ???
+---+
0x102d | 0 | ???
+---+
...
+---+
0x103f |'1'| ???
+---+
0x1040 |'?'| ???
+---+
0x1041 | 0 | ???
+---+
The behavior on reading or writing past the end of an array is undefined - the language standard places no requirements on either the compiler or the runtime environment to handle the situation in any particular way. An implementation may add bounds checking code on array access, but I'm not aware of any that do.
As long as you don't overwrite anything "important" or attempt to access protected memory, your code will appear to function correctly. However, appearing to function correctly is not the same as actually functioning correctly. As it is, you are clobbering other objects in your program. You could also overwrite important sections of the stack frame, which is why buffer overflows like this are a common malware exploit.
Specific issues: