-1

I have declared the same size of two char strings (str1 and str2). After that, I read a string through gets() function and store it on str1 then copy str1 to str2. when they are displayed, I realized str2 can store more characters than its size?

This is my code:

#include<stdio.h>
#include<string.h>
void main()
{
    char str1[20], str2[20];
    printf("Enter the first string:");
    gets(str1);
    strcpy(str2,str1);
    printf("First string is:%s\tSecond string is:%s\n",str1,str2);
}

The output here:

Enter the first string: Why can str2 store more characters than str1?
First string is:ore characters than str1?       Second string is:Why can str2 store more characters than str1?

thank everyone in advance

John Tran
  • 11
  • 2
  • 7
    It can’t — you are overwriting memory after the end of str2 and invoking undefined behavior. That fact that it appears to work is one of the more entertaining aspects of undefined behavior; anything can happen, including cases where no obvious faults are observable (eg because nothing “important” happened to be located in the bytes that got overwritten after the end of str2). You can’t rely on it working, though – Jeremy Friesner May 09 '21 at 19:08
  • 2
    This is a good example of why you should *never* under any circumstances use `gets`. Your compiler should be screaming at you. The `gets` function allows writing arbitrary information to memory. – Cheatah May 09 '21 at 19:08
  • Consider looking up buffer overflows, undefined behavior and why `gets` is dangerous. – mediocrevegetable1 May 09 '21 at 19:11
  • "I have declared the same size of two char strings (str1 and str2)." Please show a [mre] of that here as text, not as picture of text. – Yunnosch May 09 '21 at 19:13
  • @Yunnosch I have already done it. thank you for your guidance. – John Tran May 09 '21 at 19:34
  • 1
    `scanf(str1);` - It would be a good idea to read the manual page for `scanf`. You are looking for `scanf("%19s", str1);` – Ed Heal May 09 '21 at 19:36
  • Somebody please make an answer. – Yunnosch May 09 '21 at 19:49
  • @Yunnosch - Done - see below – Ed Heal May 09 '21 at 19:57
  • I am voting to reopen the question, as the most recent edit made the error reproducable again. – Andreas Wenzel May 09 '21 at 20:52
  • @AndreasWenzel An undefined behavior is _reproducible_ with some unknown probability only... – CiaPan May 09 '21 at 21:28
  • 1
    I suggest to replace the title with something meaningful, like e.g. "How can a string appear longer than its declared length?" – CiaPan May 09 '21 at 21:31

4 Answers4

2

See the updated code with comments will ensure that something is actually stored in str1 and the contents will not overrun

#include <stdio.h>
#include <string.h>
// For EXIT_...
#include <stdlib.h>
int main() // Should be returning int
{
    char str1[20], str2[20];
    printf("Enter the first string:");
    // Incorrect - see manual page - scanf(str1);
    if (scanf("%19s", str1) == 1) { // Please read the manual page - this prevents buffer over runs and checks that something is stored in str1  
    
      strcpy(str2,str1);
      printf("First string is:%s\tSecond string is:%s\n",str1,str2);
      return EXIT_SUCCESS;
    } else {
      fprintf("Unable to read string\n");
      return EXIT_FAILURE;
    }  
}
Ed Heal
  • 59,252
  • 17
  • 87
  • 127
  • `scanf("%19s", str1)` means that it reads the first 19 characters and the return value is always 1, why do we need else section? – John Tran May 09 '21 at 21:05
  • 1
    @JohnTran: The `else` block will be executed if `scanf` returns a value that is not `1`. For example, it will return `EOF` (normally defined as `-1`) on input failure (unlikely to occur, but possible). – Andreas Wenzel May 09 '21 at 21:12
2

First of all, as already pointed out in the comments section, you should never use gets in modern C code. That function is so dangerous that it has been removed from the ISO C standard. A safer alternative is fgets.

When you print str2 using the %s format specifier, printf will not just print the contents of the str2 array. It will print everything it finds in memory, until it finds a null terminating character.

Since the array str2 does not contain such a null character, it will continue printing everything it finds in memory, past the boundary of str2, until it finds a null character (unless it crashes beforehand). Since you seem to have previously written the string past the boundary of str2 (which is a buffer overflow), it will print that string, unless the memory was meanwhile overwritten by something else.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
  • 1
    @EdHeal: Thanks for pointing it out. I have updated my answer to point out the incorrect usage of `scanf` and am referring to your answer for the correct usage. – Andreas Wenzel May 09 '21 at 20:13
  • @EdHeal you're right. actually, I used `gets` instead of `scanf` then I got the output as I posted above. – John Tran May 09 '21 at 20:34
  • @JohnTran: If the problem is not reproducible with `scanf`, but only with `gets`, then you should restore your question to use `gets`, In general, you should ensure that your posted code actually reproduces the problem. – Andreas Wenzel May 09 '21 at 20:43
  • @AndreasWenzel thank you. I replaced `scanf()` with `gets()`. first time so I have a lot of faults. – John Tran May 09 '21 at 20:51
  • @JohnTran: Yes, it is normal to make mistakes at the start. Don't worry about them. As long as you learn from your mistakes, then making mistakes is actually a good thing. – Andreas Wenzel May 09 '21 at 20:58
  • @EdHeal: Meanwhile, I noticed that OP's original code used `gets` instead of `scanf` and that OP later changed the code to use `scanf` instead in an improper way, so that the problem is no longer reproducible. This is probably why the question was closed. Therefore, OP has now changed the code back to `gets`. Since this partially invalidated my answer, I have updated my answer accordingly (thereby also removing my link to your answer). Since your answer also got invalidated by the updated question, you may also want to update your answer. – Andreas Wenzel May 09 '21 at 21:07
  • @AndreasWenzel - Thanks for the heads up - I do not think I will bother as the question is now closed – Ed Heal May 09 '21 at 21:17
  • @AndreasWenzel thank both of you so much. I got so many valuable things from you guys today. – John Tran May 09 '21 at 21:24
  • @EdHeal: `"Thanks for the heads up - I do not think I will bother as the question is now closed"` -- Note that the question has now been reopened. – Andreas Wenzel May 09 '21 at 21:32
  • @AndreasWenzel - Thanks - will fix tomorrow as it is rather late here and boyfriend wants to go to bed – Ed Heal May 09 '21 at 22:11
1

I realized str2 can store more characters than its size?

No. What's happening is that excess characters are being written past the end of one array, and that's overwriting the contents of the other array (or other objects). C doesn't mandate bounds checking on array accesses - if you write past the end of an array, you won't get an "IndexOutOfBounds" exception or anything like that.

Based on your output, here's what's happening - str2 is allocated at a lower address than str1, like so (address values are for illustration only):

              +---+
0x1000  str2: |   | str2[0]
              +---+ 
0x1001        |   | str2[1]
              +---+
0x1002        |   | str2[2]
              +---+
               ...
              +---+
0x1013        |   | str2[19]
              +---+
0x1014  str1: |   | str1[0]
              +---+ 
0x1015        |   | str1[1]
              +---+
0x1016        |   | str1[2]
              +---+
               ...
              +---+
0x1027        |   | str1[19]
              +---+

So the first thing you do is

gets( str1 );

and enter the string "Why can str2 store more characters than str1?", which is 45 characters long. Unfortunately, gets only receives the starting address of the buffer - it has no way of knowing how long the buffer is. So it happily stores the "ore characters than str1?" portion of the string to the memory immediately following the end of str1:

              +---+
0x1000  str2: |   | str2[0]
              +---+ 
0x1001        |   | str2[1]
              +---+
0x1002        |   | str2[2]
              +---+
               ...
              +---+
0x1013        |   | str2[19]
              +---+
0x1014  str1: |'W'| str1[0]
              +---+ 
0x1015        |'h'| str1[1]
              +---+
0x1016        |'y'| str1[2]
              +---+
               ...
              +---+
0x1027        |'m'| str1[19]
              +---+
0x1028        |'o'| ???
              +---+
0x1029        |'r'| ???
              +---+
0x102a        |'e'| ???
              +---+
               ...
              +---+
0x103f        |'1'| ???
              +---+
0x1040        |'?'| ???
              +---+
0x1041        | 0 | ???
              +---+

gets also writes a 0 terminator to mark the end of the string.

The next thing you do is call strcpy to copy the contents of str1 to str2. Like gets, strcpy only gets the starting addresses of the source and target buffers - it doesn't know how long either buffer is. It relies on the presence of the 0 terminator in the source string to tell it when to stop copying. Thus, the first 20 characters of str1 get copied to str2, and the remaining characters "spill" back over into str1, overwriting what was there originally. After the strcpy call, you get the following:

              +---+
0x1000  str2: |'W'| str2[0]
              +---+ 
0x1001        |'h'| str2[1]
              +---+
0x1002        |'y'| str2[2]
              +---+
               ...
              +---+
0x1013        |' '| str2[19]
              +---+
0x1014  str1: |'m'| str1[0]
              +---+ 
0x1015        |'o'| str1[1]
              +---+
0x1016        |'r'| str1[2]
              +---+
0x1017        |'e'| str1[3]
              +---+
               ...
              +---+
0x1027        |' '| str1[19]
              +---+
0x1028        |'s'| ???
              +---+
0x1029        |'t'| ???
              +---+
0x102a        |'r'| ???
              +---+
0x102b        |'1'| ???
              +---+
0x102c        |'?'| ???
              +---+
0x102d        | 0 | ???
              +---+
               ...
              +---+
0x103f        |'1'| ???
              +---+
0x1040        |'?'| ???
              +---+
0x1041        | 0 | ???
              +---+

The behavior on reading or writing past the end of an array is undefined - the language standard places no requirements on either the compiler or the runtime environment to handle the situation in any particular way. An implementation may add bounds checking code on array access, but I'm not aware of any that do. As long as you don't overwrite anything "important" or attempt to access protected memory, your code will appear to function correctly. However, appearing to function correctly is not the same as actually functioning correctly. As it is, you are clobbering other objects in your program. You could also overwrite important sections of the stack frame, which is why buffer overflows like this are a common malware exploit.

Specific issues:

  • NEVER NEVER NEVER use gets, for any reason - it will introduce a point of failure in your code as shown above. It was deprecated after the C99 standard and removed from the standard library as of the 2011 standard. Use fgets instead:
    if ( fgets(str1, sizeof str1, stdin) )
    {
      // do stuff with str1
    }
  • The standard signatures for main are
    • int main( void )
    • int main( int argc, char **argv ) // or equivalent
    Unless your implementation explicitly lists void main() as a valid signature, use one of the two above (in your case, the first one would be appropriate).
John Bode
  • 119,563
  • 19
  • 122
  • 198
0

You can also use strncpy, which offers as a third parameter a length parameter. This can be helpful to avoid to write out of bounds. Example:

 strncpy (str2, str1, (size_t) 20); //fixed size 20
  • You do not need the cast – Ed Heal May 09 '21 at 20:06
  • right - ignore that cast. strncpy (str2, str1, 20); is fine –  May 09 '21 at 20:08
  • Note that even `strncpy` does not put the NUL terminator at the end, so it is still potentially unsafe. Either put it manually or instead do something like `snprintf(str1, sizeof str1, "%s", str2)` as `snprintf` puts the NUL terminator unlike `strncpy`. There is likely more overhead from using `snprintf` however. – mediocrevegetable1 May 09 '21 at 20:30