1

I'm trying to create a program that reads some string, but when I test a very long string, an overflow occurs, and all the solutions I've already seen do not work. The following code is:

#include <stdio.h>

int main()
{
    char nome[201] = {0};
    char cpf[15] = {0};
    char senha[101] = {0};
    scanf("%200s", nome);
    scanf("%14s", cpf);
    scanf("%100s", senha);
    printf("nome: %s\n", nome);
    printf("cpf: %s\n", cpf);
    printf("senha: %s\n", senha);
    return 0;
}

This code is supposed to prevent the overflow, but the following string:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaassssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss

I'm trying to put the string in all inputs and when it comes to the second the program is finished and the overflow content goes to the third string.

Eduardo Mosca
  • 71
  • 1
  • 7
  • 1
    `scanf("%200s", nome);` reads up to 200 non-white-space characters, leaving the rest in `stdin` for the next input function. Code is acting as it should. Your expectations are amiss. If you want to consume and toss characters past the 200, you need other code. – chux - Reinstate Monica Nov 29 '22 at 18:39
  • Eduardo Mosca, _buffer_ overflow is prevented. What output do you want with the 3 "aaa...sss" input? – chux - Reinstate Monica Nov 29 '22 at 18:51
  • The output I want is the variables with the string limited and not skipping the third variable with this input. For example, in the second variable, I want only the 14 first characters of this input and the same for the last input – Eduardo Mosca Nov 29 '22 at 18:57
  • Eduardo Mosca, OK. If input was `"aaa bbb ccc\n" "ddd eee fff\n" "ggg hhh iii\n"`, what output would you like? (If the line of input contained spaces?) – chux - Reinstate Monica Nov 29 '22 at 19:01
  • The output needs to be the same since none of these outputs has a length larger than the memory allocated to the variables. – Eduardo Mosca Nov 29 '22 at 19:09
  • Exactly, and if the input has a length larger than the memory allocated I want to store until the space required – Eduardo Mosca Nov 29 '22 at 19:13
  • Eduardo Mosca, Note that the accepted answer does not meet the "output needs to be the same" with input like [here](https://stackoverflow.com/questions/74618714/string-with-buffer-overflow?noredirect=1#comment131712103_74618714) as it contains spaces. If you want to read a _line_, begin with `fgets()` – chux - Reinstate Monica Nov 29 '22 at 19:18
  • It solves the problem of the input I asked for, but you're right since I still can not have the input with spaces, I'm trying some things, but nothing seems to work – Eduardo Mosca Nov 29 '22 at 19:21
  • "but nothing seems to work" --> take time to to clearly identify your goal. Consider inputs with/without spaces, overly long, short and as small as `"\n"`. Should extra text be silently discard or get noted? Is the length limit fixed or driven by a variable? Should the `'\n'` get saved? If _sounds_ like you want to read a _line_ into a `n`-sized buffer and not save the `'\n'`, but save spaces. Unclear on how you want an overly long line to report. – chux - Reinstate Monica Nov 29 '22 at 19:31
  • When I use `fgets() ` I have the same problem as above. I can read the line but the result is the same problem I had – Eduardo Mosca Nov 29 '22 at 19:31
  • Eduardo Mosca, as commented [above](https://stackoverflow.com/questions/74618714/string-with-buffer-overflow?noredirect=1#comment131712390_74618714), "_begin_ with `fgets()`". It is not the only thing to do. – chux - Reinstate Monica Nov 29 '22 at 19:32
  • I understood your point, but how can I mix fgets to get the result? – Eduardo Mosca Nov 29 '22 at 19:39
  • @EduardoMosca: I have now added a solution that uses `fgets` to my answer. – Andreas Wenzel Nov 29 '22 at 20:15

3 Answers3

3

You asked for inputs in order. The first one has a maximum length of 200 characters, the second 14, and the third 100. You input a string of 160.

Ignoring the first variable for now (since there's no overflow), C takes the first 14 characters from the input buffer and puts them in the second variable. It terminates this with a null terminator. No overflow has occurred.

Now we need to get data for the third variable. Specifically, we need to get the next 100 characters, or all of the characters up to the next whitespace, whichever is shorter. We put 160 characters into the input buffer (your keyboard smash) and took 14 out. Therefore, there are still 144 characters in the input buffer. No need to interactively wait for input anymore; C takes the first 100 of those characters and puts them into the third variable, terminated with a null terminator. Now all of our inputs have been completed, so the program continues.

There is no buffer overflow vulnerability here. The program is well-defined and does what you asked it to. You asked it to read from the input buffer three times. You never said "from three different lines". If you want to do that, then you need to handle delimiters yourself. In C++, there's a function called std::getline that will do it for you, but in C, you'll need to manually read (and discard) the rest of the line yourself. Something like this would suffice.

scanf("%200s%*[^\n]", nome);

The * indicates that the newly-read value should not be stored anywhere, and the [^\n] indicates that zero or more non-newline characters should be read, until the pattern doesn't match anymore (i.e. until the next character is a newline or we hit the end-of-file)

Silvio Mayolo
  • 62,821
  • 6
  • 74
  • 116
  • 1
    In order to discard the remainder of the line, I recommend `scanf( "%*[^\n]" );` – Andreas Wenzel Nov 29 '22 at 18:56
  • @AndreasWenzel `scanf( "%*[^\n]" );` almost discards the remainder of the line. It does not discards the line's `'\n'`. `scanf( "%*[^\n]" ); scanf( "%*1[\n]" );` does. – chux - Reinstate Monica Nov 29 '22 at 19:16
  • Note that `scanf("%200s%*[^\n]", nome);` will not read an empty line (`"\n"`) well, but instead wait for another non-`"\n"` line of input. [@Andreas Wenzel](https://stackoverflow.com/questions/74618714/string-with-buffer-overflow?noredirect=1#comment131712007_74618812) idea is a good start as a separate call. – chux - Reinstate Monica Nov 29 '22 at 19:22
1

Your posted code does not have a buffer overflow, but you are right that the input from one input prompt "overflows" into the next input prompt.

What is happening is the following:

Since your input string consists of 160 characters (161 characters including the null terminating character), when you first enter that input, it will fit entirely inside the array nome, so the line

scanf("%200s", nome);

will read this input entirely.

However, when you enter that input a second time, this time at the second input prompt, the line

scanf("%14s", cpf);

will only read the first 14 characters of that input and leave the remaining 146 characters on the input stream.

Therefore, the line

scanf("%100s", senha);

will read 100 of the remaining 146 characters of the input stream and write them into senha. So you are correct in saying that the second input prompt "overflows" into the third input prompt.

If you want to prevent this "overflow" from happening, you will have to discard all remaining characters on the line before the next input prompt, for example by calling:

scanf( "%*[^\n]" );

However, I generally do not recommend using the function scanf for user input, as that is not what it is designed to be used for.

Also, judging form the comments you made in the comments section, you want to be able to read entire lines that may be separated by spaces, instead of reading single words. However, the %s scanf format specifier will only read a single word.

For this reason, it is probably better for you to use the funtion fgets. This function will always attempt to read an entire line at once, including the newline character, instead of only a single word.

However, when using fgets, you will probably want to remove the newline character from the input. See the following question on how to do that:

Removing trailing newline character from fgets() input.

In the program below, I have created a function get_line_from_user which will read a single line from the user using fgets, discard the newline character, and if the line does not fit into the buffer, it willl also discard the rest of the line:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void get_line_from_user( char *buffer, int buffer_size );

int main()
{
    char nome[201] = {0};
    char cpf[15] = {0};
    char senha[101] = {0};

    printf( "Input Phase: \n\n" );

    //read inputs
    printf( "Nome: " );
    get_line_from_user( nome, sizeof nome );
    printf( "Cpf: " );
    get_line_from_user( cpf, sizeof cpf );
    printf( "Senha: " );
    get_line_from_user( senha, sizeof senha );

    printf( "\n\nOutput Phase: \n\n" );

    //output the results
    printf("nome: %s\n", nome);
    printf("cpf: %s\n", cpf);
    printf("senha: %s\n", senha);

    return 0;
}

//This function will read exactly one line of input from the
//user and discard the newline character. If the line does
//not fit into the buffer, it will also discard the rest of
//the line from the input stream.
void get_line_from_user( char *buffer, int buffer_size )
{
    char *p;

    //attempt to read one line of input
    if ( fgets( buffer, buffer_size, stdin ) == NULL )
    {
        printf( "Error reading from input\n" );
        exit( EXIT_FAILURE );
    }

    //attempt to find newline character
    p = strchr( buffer, '\n' );

    //determine whether entire line was read in (i.e. whether
    //the buffer was too small to store the entire line)
    if ( p == NULL )
    {
        int c;

        //discard remainder of line from input stream
        do
        {
            c = getchar();
        
        } while ( c != EOF && c != '\n' );
    }
    else
    {
        //remove newline character by overwriting it with
        //null character
        *p = '\0';
    }
}

This program has the following behavior:

Input Phase: 

Nome: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaassssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
Cpf: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaassssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
Senha: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaassssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss


Output Phase: 

nome: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaassssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
cpf: aaaaaaaaaaaaaa
senha: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaassssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss

As you can see, the input from one input prompt no longer overflows into the input of another input prompt.

And it now also works with spaces in the input:

Input Phase: 

Nome: This is a test 
Cpf: Another test
Senha: Yet another test


Output Phase: 

nome: This is a test
cpf: Another test
senha: Yet another test
Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
  • It worked perfectly. Thank you a lot for your time and effort. – Eduardo Mosca Nov 29 '22 at 20:39
  • @EduardoMosca: I am pleased that I was able to help. Do you also understand how the function `get_line_from_user` works? – Andreas Wenzel Nov 29 '22 at 20:42
  • Actually, I could understand until the `p = strchr( buffer, '\n' );`. I am with a bit of difficult understanding the if and else after. – Eduardo Mosca Nov 29 '22 at 20:58
  • @EduardoMosca: I suggest that you read the documentation of [`fgets`](https://en.cppreference.com/w/c/io/fgets) and [`strchr`](https://en.cppreference.com/w/c/string/byte/strchr) to see exactly what these two functions do. I use the function `strchr` to determine whether it can find a newline character in the input string. That way, I can determine whether the entire line was read in or not. If this is the case, then I remove the newline character from the input. If not, then I discard the remainder of the line from the input stream. – Andreas Wenzel Nov 30 '22 at 00:24
1

Other answers have well addressed why OP's code is performing as it is.


To robustly read a line in C, unfortunately, is not easy and a real good way is beyond a beginner's need.

One modest approach using fgets():

// Return 1 on success.
// Return EOF on input error or end-of-file with no input.
// Return 0 when input exceeds buffer space.
// A line's \n is read, but not saved.
// If using explicitly C needs to include stdbool.h library
int read1line(size_t n, char * restrict s, FILE * restrict stream) {
  if (fgets(s, n, stream) == NULL) {
    return EOF;
  }
  size_t len = strlen(s);
  // Was a \n read?
  if (len > 0 && s[len-1] ==  '\n') {
    s[--len] = '\0';
  }
  // Potentially more?
  if (len + 1 == n) {
    int ch;
    bool more_read = false;
    while ((ch = fgetc(stream)) != '\n' && ch != EOF) {
      more_read = true;
    }
    if (ch == EOF && !feof(stream)) {
      return EOF;
    }
    if (more_read) {
      return 0;
    }
  } 
  return 1;
}

The above still has corner weaknesses:

  1. Reading a null character then incorrectly determines len.
  2. s == NULL, n <= 0 or n > INT_MAX remain unhandled pathological cases.
  3. Odd systems where CHAR_MAX > INT_MAX need special handling.
  4. It would be useful to indicate length in buffer, once #1 solved.
Eduardo Mosca
  • 71
  • 1
  • 7
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256