21

I can specify the maximum amount of characters for scanf to read to a buffer using this technique:

char buffer[64];

/* Read one line of text to buffer. */
scanf("%63[^\n]", buffer);

But what if we do not know the buffer length when we write the code? What if it is the parameter of a function?

void function(FILE *file, size_t n, char buffer[n])
{
    /* ... */
    fscanf(file, "%[^\n]", buffer); /* WHAT NOW? */
}

This code is vulnerable to buffer overflows as fscanf does not know how big the buffer is.

I remember seeing this before and started to think that it was the solution to the problem:

fscanf(file, "%*[^\n]", n, buffer);

My first thought was that the * in "%*[*^\n]" meant that the maximum string size is passed an argument (in this case n). This is the meaning of the * in printf.

When I checked the documentation for scanf I found out that it means that scanf should discard the result of [^\n].

This left me somewhat disappointed as I think that it would be a very useful feature to be able to pass the buffer size dynamically for scanf.

Is there any way I can pass the buffer size to scanf dynamically?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
wefwefa3
  • 3,872
  • 2
  • 29
  • 51
  • `size_t n` doesn't it give the buffer size? – Gopi Feb 11 '15 at 16:58
  • possible duplicate of [How to limit scanf function in C to print error when input is too long?](http://stackoverflow.com/questions/10886594/how-to-limit-scanf-function-in-c-to-print-error-when-input-is-too-long) – Eugene Sh. Feb 11 '15 at 17:00
  • 1
    IMO, the possible duplicate isn't very accurately a duplicate. – Jonathan Leffler Feb 11 '15 at 17:03
  • Possible duplicate of [How to prevent scanf causing a buffer overflow in C?](http://stackoverflow.com/questions/1621394/how-to-prevent-scanf-causing-a-buffer-overflow-in-c) – jamesdlin Oct 20 '16 at 17:07

3 Answers3

19

Basic answer

There isn't an analog to the printf() format specifier * in scanf().

In The Practice of Programming, Kernighan and Pike recommend using snprintf() to create the format string:

size_t sz = 64;
char format[32];
snprintf(format, sizeof(format), "%%%zus", sz);
if (scanf(format, buffer) != 1) { …oops… }

Extra information

Upgrading the example to a complete function:

int read_name(FILE *fp, char *buffer, size_t bufsiz)
{
    char format[16];
    snprintf(format, sizeof(format), "%%%zus", bufsiz - 1);
    return fscanf(fp, format, buffer);
}

This emphasizes that the size in the format specification is one less than the size of the buffer (it is the number of non-null characters that can be stored without counting the terminating null). Note that this is in contrast to fgets() where the size (an int, incidentally; not a size_t) is the size of the buffer, not one less. There are multiple ways of improving the function, but it shows the point. (You can replace the s in the format with [^\n] if that's what you want.)

Also, as Tim Čas noted in the comments, if you want (the rest of) a line of input, you're usually better off using fgets() to read the line, but remember that it includes the newline in its output (whereas %63[^\n] leaves the newline to be read by the next I/O operation). For more general scanning (for example, 2 or 3 strings), this technique may be better — especially if used with fgets() or getline() and then sscanf() to parse the input.

Also, the TR 24731-1 'safe' functions, implemented by Microsoft (more or less) and standardized in Annex K of ISO/IEC 9899-2011 (the C11 standard), require a length explicitly:

if (scanf_s("%[^\n]", buffer, sizeof(buffer)) != 1)
    ...oops...

This avoids buffer overflows, but probably generates an error if the input is too long. The size could/should be specified in the format string as before:

if (scanf_s("%63[^\n]", buffer, sizeof(buffer)) != 1)
    ...oops...

if (scanf_s(format, buffer, sizeof(buffer)) != 1)
    ...oops...

Note that the warning (from some compilers under some sets of flags) about 'non-constant format string' has to be ignored or suppressed for code using the generated format string.

jethro
  • 168
  • 3
  • 13
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    A much better solution for the specific case of `"%64[^\n]"` is `fgets(buf, 64, stdin)` --- it does *exactly that*, except that you can provide the length as a "proper" argument. – Tim Čas Feb 11 '15 at 17:08
  • Ahh, I remember now that I what saw was `"%*[^\n]"` but apparently the `*` means "discard the result" instead of "value in argument list" for `scanf`. Nice solution for the problem! – wefwefa3 Feb 11 '15 at 17:24
  • 3
    @Tim `fgets(buf, 64, stdin)` is not _exactly_ the same. `fgets` also gives you the newline character. – wefwefa3 Feb 11 '15 at 17:52
  • 2
    @elias: Oh right, forgot about that. Easily fixed though: `buf[strcspn(buf, "\n")] = 0;` – Tim Čas Feb 11 '15 at 17:57
  • @TimČas: you're right about `fgets()` probably being better for this specific example. I've added extra information to the answer, referencing your point. – Jonathan Leffler Feb 11 '15 at 18:00
  • Note: `scanf_s("%[^\n]"` is a problem if the first `char` is `'\n'` in which case, nothing is saved and `'\n'` remains in `stdin`. This of course is caught in the `...oops...` handler. – chux - Reinstate Monica Feb 11 '15 at 18:41
  • 1
    @chux: valid point. The usual sensible fix is to include a space at the start of the format string, so leading white space is skipped. Note that trying to read the newline after the string is hard if it is at the end of the format string and the input is interactive (a trailing newline or blank in the format string is an interactive disaster; the input doesn't terminate until a non-white space character is typed). – Jonathan Leffler Feb 11 '15 at 18:48
  • @TimČas Nice comment about `buf[strcspn(buf, "\n")] = 0;`. Suggest adding that to http://stackoverflow.com/q/2693776/2410359 . AFAIK, `strcspn()` well handles 2 problem `buf`: 1) `buf` that starts with `'\0'` 2) `buf` that does not contain `'\n'`. – chux - Reinstate Monica Feb 11 '15 at 18:49
  • @chux: Indeed it does, because `strcspn()` stops at a `'\0'`. Anyways, replied: http://stackoverflow.com/a/28462221/485088 – Tim Čas Feb 11 '15 at 18:55
  • Could you elaborate on "probably generates an error if the input is too long. The size could/should be specified in the format string as before"? I can't see the benefit of specifying length in `scanf_s`'s format string. – a3f Dec 02 '15 at 10:17
  • 1
    @a3f: Up to you. I don't have a Windows machine so I can't verify what I'm about to say, but… If you're using `scanf_s()` or one of its relatives and the format for a `%c`, `%s` or `%[` conversion specification is limited to, say, 20 characters (e.g. `%20s`) and the user types 21 characters, the 21st character is processed by the next conversion specification (which may not work). On the other hand, if you have just `%s`, the 21st character triggers a runtime constraint violation and all the input is effectively lost and the scan completes. You get to choose which is the behaviour you prefer. – Jonathan Leffler Dec 02 '15 at 15:49
  • @JonathanLeffler I wasn't aware that they differ in that regard. Thanks! – a3f Dec 09 '15 at 09:41
7

There is indeed no variable width specifier in the scanf family of functions. Alternatives include creating the format string dynamically (though this seems a bit silly if the width is a compile-time constant) or simply accepting the magic number. One possibility is to use preprocessor macros for specifying both the buffer and format string width:

#define STR_VALUE(x) STR(x)
#define STR(x) #x

#define MAX_LEN 63

char buffer[MAX_LEN + 1];
fscanf(file, "%" STR_VALUE(MAX_LEN) "[^\n]", buffer);
Arkku
  • 41,011
  • 10
  • 62
  • 84
  • 1
    This does obviously not work when the buffer length is the parameter of a function but it is a nice trick to use in situations where the buffer length is known preprocess-time. – wefwefa3 Feb 11 '15 at 17:17
  • 2
    @elias Yes, since it's done with the preprocessor the value must be a preprocessor macro, and indeed only a simple integer literal at that (even `STR_VALUE(BUFFER_SIZE - 1)` wouldn't work). – Arkku Feb 11 '15 at 17:20
-1

Another option is to #define the length of the string:

#define STRING_MAX_LENGTH "%10s"

or

#define DOUBLE_LENGTH "%5lf"
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Patrick
  • 97
  • 8