98

I use this code:

while ( scanf("%s", buf) == 1 ){

What would be the best way to prevent possible buffer overflow so that it can be passed strings of random lengths?

I know I can limit the input string by calling for example:

while ( scanf("%20s", buf) == 1 ){

But I'd prefer to be able to process whatever the user inputs. Or can't this be done safely using scanf and I should use fgets?

ggorlen
  • 44,755
  • 7
  • 76
  • 106
goe
  • 5,207
  • 14
  • 45
  • 49

6 Answers6

80

In their book The Practice of Programming (which is well worth reading), Kernighan and Pike discuss this problem, and they solve it by using snprintf() to create the string with the correct buffer size for passing to the scanf() family of functions. In effect:

int scanner(const char *data, char *buffer, size_t buflen)
{
    char format[32];
    if (buflen == 0)
        return 0;
    snprintf(format, sizeof(format), "%%%ds", (int)(buflen-1));
    return sscanf(data, format, buffer);
}

Note, this still limits the input to the size provided as 'buffer'. If you need more space, then you have to do memory allocation, or use a non-standard library function that does the memory allocation for you.


Note that the POSIX 2008 (2013) version of the scanf() family of functions supports a format modifier m (an assignment-allocation character) for string inputs (%s, %c, %[). Instead of taking a char * argument, it takes a char ** argument, and it allocates the necessary space for the value it reads:

char *buffer = 0;
if (sscanf(data, "%ms", &buffer) == 1)
{
    printf("String is: <<%s>>\n", buffer);
    free(buffer);
}

If the sscanf() function fails to satisfy all the conversion specifications, then all the memory it allocated for %ms-like conversions is freed before the function returns.

Filipe Gonçalves
  • 20,783
  • 6
  • 53
  • 70
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • @Sam: Yes, it should be `buflen-1` — Thank You. You then have to worry about unsigned underflow (wrapping to a rather large number), hence the `if` test. I'd be sorely tempted to replace that with an `assert()`, or back it up with an `assert()` before the `if` that fires during development if anyone is careless enough to pass 0 as the size. I've not carefully reviewed the documentation for what `%0s` means to `sscanf()` — the test might be better as `if (buflen < 2)`. – Jonathan Leffler Sep 11 '13 at 16:35
  • So `snprintf` writes some data to a string buffer, and `sscanf` reads from that created string. Where exactly does this replace `scanf` in that it reads from stdin? – krb686 Apr 26 '15 at 14:18
  • It's also quite confusing that you use the word "format" for your result string and thus pass in "format" as the first argument to `snprintf` yet it is not the actual format parameter. – krb686 Apr 26 '15 at 14:24
  • @krb686: This code is written so that the data to be scanned is in the parameter `data` and hence `sscanf()` is appropriate. If you want to read from standard input instead, drop the `data` parameter and call `scanf()` instead. As to the choice of the name `format` for the variable that becomes the format string in the call to `sscanf()`, you are entitled to rename it if you wish, but its name is not inaccurate. I am not sure what alternative makes sense; would `in_format` make it any clearer? I am not planning to change it in this code; you may if you use this idea in your own code. – Jonathan Leffler Apr 26 '15 at 15:28
  • @krb686: Incidentally, the `snprintf()` takes the number in `buflen` (say 256), and creates a string in the variable `format` that is equivalent to `"%255s"`. Then `sscanf()` reads from the string `data` using `format` as the format string (conversion specification) to control how the data is interpreted, with the result being written into the variable `buffer`. Thus the leading white space in `data` is ignored and the first 'word' is copied to `buffer` without risk of buffer overflow as long as the area pointed at by `buffer` has at least as much space as `buflen` says it has. – Jonathan Leffler Apr 26 '15 at 15:42
  • Note also that MSVC at least up to 2013 does not conform to POSIX standards, so the format modifier for "%ms" is not available – mabraham Jun 24 '15 at 08:47
  • 1
    @mabraham: It is still true under macOS Sierra 10.12.5 (up to 2017-06-06) — the `scanf()` on macOS is not documented as supporting `%ms`, useful though it would be. – Jonathan Leffler Jun 06 '17 at 19:36
35

If you are using gcc, you can use the GNU-extension a specifier to have scanf() allocate memory for you to hold the input:

int main()
{
  char *str = NULL;

  scanf ("%as", &str);
  if (str) {
      printf("\"%s\"\n", str);
      free(str);
  }
  return 0;
}

Edit: As Jonathan pointed out, you should consult the scanf man pages as the specifier might be different (%m) and you might need to enable certain defines when compiling.

John Ledbetter
  • 13,557
  • 1
  • 61
  • 80
  • 8
    That's more an issue of using glibc (the GNU C Library) than of using the GNU C Compiler. – Jonathan Leffler Oct 25 '09 at 20:24
  • 4
    And note that the POSIX 2008 standard provides the `m` modifier to do the same job. See [`scanf()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/scanf.html). You'll need to check whether the systems you use do support this modifier. – Jonathan Leffler Feb 22 '14 at 21:15
  • 4
    GNU (as found on Ubuntu 13.10, at any rate) supports `%ms`. The notation `%a` is a synonym for `%f` (on output, it requests hexadecimal floating point data). The GNU man page for `scanf()` says: _ It is not available if the program is compiled with `gcc -std=c99` or gcc -D_ISOC99_SOURCE (unless `_GNU_SOURCE` is also specified), in which case the `a` is interpreted as a specifier for floating-point numbers (see above)._ – Jonathan Leffler Feb 22 '14 at 21:16
8

Most of the time a combination of fgets and sscanf does the job. The other thing would be to write your own parser, if the input is well formatted. Also note your second example needs a bit of modification to be used safely:

#define LENGTH          42
#define str(x)          # x
#define xstr(x)         str(x)

/* ... */ 
int nc = scanf("%"xstr(LENGTH)"[^\n]%*[^\n]", array); 

The above discards the input stream upto but not including the newline (\n) character. You will need to add a getchar() to consume this.

Roland Illig
  • 40,703
  • 10
  • 88
  • 121
dirkgently
  • 108,024
  • 16
  • 131
  • 187
4

Directly using scanf(3) and its variants poses a number of problems. Typically, users and non-interactive use cases are defined in terms of lines of input. It's rare to see a case where, if enough objects are not found, more lines will solve the problem, yet that's the default mode for scanf. (If a user didn't know to enter a number on the first line, a second and third line will probably not help.)

At least if you fgets(3) you know how many input lines your program will need, and you won't have any buffer overflows...

DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
1

Limiting the length of the input is definitely easier. You could accept an arbitrarily-long input by using a loop, reading in a bit at a time, re-allocating space for the string as necessary...

But that's a lot of work, so most C programmers just chop off the input at some arbitrary length. I suppose you know this already, but using fgets() isn't going to allow you to accept arbitrary amounts of text - you're still going to need to set a limit.

Mark Bessey
  • 19,598
  • 4
  • 47
  • 69
0

It's not that much work to make a function that's allocating the needed memory for your string. That's a little c-function i wrote some time ago, i always use it to read in strings.

It will return the read string or if a memory error occurs NULL. But be aware that you have to free() your string and always check for its return value.

#define BUFFER 32

char *readString()
{
    char *str = malloc(sizeof(char) * BUFFER), *err;
    int pos;
    for(pos = 0; str != NULL && (str[pos] = getchar()) != '\n'; pos++)
    {
        if(pos % BUFFER == BUFFER - 1)
        {
            if((err = realloc(str, sizeof(char) * (BUFFER + pos + 1))) == NULL)
                free(str);
            str = err;
        }
    }
    if(str != NULL)
        str[pos] = '\0';
    return str;
}
Antony Hatchkins
  • 31,947
  • 10
  • 111
  • 111
  • 1
    `sizeof (char)` is by definition `1`. You don't need it here. – RastaJedi Sep 14 '16 at 17:27
  • 1
    Usually it's good practise to keep pointer allocation/freeing at the same level, meaning that your function should not allocate memory on its own, since the caller then has to free it. Most of the standard library / posix functions adhere to this principle by either returning a static string (like `strerror(3)`) or expect a pre-allocated string passed in (like (`strerror_r(3)` - or `scanf(3)` ) ... – Michael Beer Jun 12 '18 at 08:56
  • 1
    This code is not correct by any means. `str[pos] = getchar()` cannot check for the special value EOF being returned from the function. – Antti Haapala -- Слава Україні Feb 08 '21 at 08:06