22

I have snprintf and it can avoid a buffer overflow, but why there is no function called snscanf?

Code:

int main()
{
     char * src = "helloeveryone";
     char buf1[5];
     sscanf(src,"%s",buf1); // here is a  array out of bounds

}

So, I think a snscanf is also needed. Why do we have only have snprintf?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Lidong Guo
  • 2,817
  • 2
  • 19
  • 31
  • 5
    `sscanf(src,"%4s",buf1);`, – BLUEPIXY Aug 21 '13 at 22:25
  • 1
    Or always make your string output buffers at least as large as the input buffer. sscanf cannot put out more string than it has. – Zan Lynx Aug 21 '13 at 23:38
  • @ZanLynx: That's a great method for `sscanf` that's usually overlooked. Of course it doesn't work for `scanf` and `fscanf`. – R.. GitHub STOP HELPING ICE Aug 22 '13 at 01:51
  • Most of the answers talk about specifying the size of an output string but I think the problem is when the src string is not null-terminated and you need to specify its size so that `sscanf()` doesn't read past the end of the array. Just null-terminating src is not always an option. – SMMB Mar 27 '23 at 17:13

6 Answers6

11

The controversial (and optional) Annex K to C11 adds a sscanf_s function which takes an additional argument of type rsize_t (also defined in Annex K) after the pointer argument, specifying the size of the pointed-to array. For better or worse, these functions are not widely supported. You can achieve the same results by putting the size in the conversion specifier, e.g.

char out[20];
sscanf(in, "%19s", out);

but this is awkward and error-prone if the size of the destination object may vary at runtime (you would have to construct the conversion specifier programmatically with snprintf). Note that the field width in the conversion specifier is the maximum number of input characters to read, and sscanf also writes a terminating null byte for %s conversions, so the field width you pass must be strictly less than the size of the destination object.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • See [Do you use the TR24731 'safe' functions](http://stackoverflow.com/questions/372980/do-you-use-the-tr-24731-safe-functions) for more information — and a discussion of some problems with them (notably that the Microsoft implementation is not always the same as the 'standard C' implementation in terms of number of parameters). You should show the use of `sscanf_s(in, "%s", out, sizeof(out));` emphasizing that the `rsize_t` length goes after the parameter that is scanned. – Jonathan Leffler Aug 21 '13 at 23:00
  • 1
    @JonathanLeffler: That's incorrect. You need `(rsize_t)sizeof(out)`. Just using `sizeof(out)` could invoke UB, depending on whether `rsize_t` and `size_t` are distinct types. (Funny how these interfaces that were supposedly designed to prevent dangerous usage actually encourage dangerous things like passing the wrong types to a variadic function...) – R.. GitHub STOP HELPING ICE Aug 21 '13 at 23:14
  • 2
    Hmmm...§K.3.3 Common definitions `` says: '... The type is `rsize_t` which is the type `size_t`.385)' That means that in fact you can pass `size_t` without needing a cast — as long as the value passed is within the range defined by `RSIZE_MAX` in `` (and footnote 385 cross-references ``). – Jonathan Leffler Aug 21 '13 at 23:21
  • Ah, then my notes in the answer are wrong, and Microsoft's version of the function is incompatible with the standard... removing the incorrect info. – R.. GitHub STOP HELPING ICE Aug 22 '13 at 00:17
  • 1
    Ancient history — but one of the issues I note in my answer to the [Do you use the TR24731 'safe' functions](http://stackoverflow.com/questions/372980/do-you-use-the-tr-24731-safe-functions) question previously mentioned is precisely that the Microsoft implementation of the `*_s()` functions does not match the standard, which makes them less useful. Note too that there is a move afoot to remove Annex K from C.next, or to deprecate it — I'm not sure whether that is progressing. Links in the answer – Jonathan Leffler Aug 10 '17 at 20:59
10

There's no need for an snscanf() because there's no writing to the first buffer argument. The buffer length in snprintf() specifies the size of the buffer where the writing goes to:

char buffer[256];

snprintf(buffer, sizeof(buffer), "%s:%d", s, n);

The buffer in the corresponding position for sscanf() is a null-terminated string; there's no need for an explicit length as you aren't going to write to it (it's a const char * restrict buffer in C99 and C11).

char buffer[256];
char string[100];
int n;
if (sscanf(buffer, "%s %d", string, &n) != 2)
    ...oops...

In the output, you are already expected to specify the length of the strings (though you're probably in the majority if you use %s rather than %99s or whatever is strictly appropriate):

if (sscanf(buffer, "%99s %d", string, &n) != 2)
    ...oops...

It would be nice/useful if you could use %*s as you can with snprintf(), but you can't — in sscanf(), the * means 'do not assign scanned value', not the length. Note that you wouldn't write snscanf(src, sizeof(buf1), "%s", buf1), not least because you can have multiple %s conversion specifications in a single call. Writing snscanf(src, sizeof(buf1), sizeof(buf2), "%s %s", buf1, buf2) makes no sense, not least because it leaves an insoluble problem in parsing the varargs list. It would be nice to have a notation such as snscanf(src, "%@s %@s", sizeof(buf1), buf1, sizeof(buf2), buf2) to obviate the need to specify the field size (minus one) in the format string. Unfortunately, you can't do that with sscanf() et al now.

Annex K of ISO/IEC 9899:2011 (previously TR24731) provides sscanf_s(), which does take lengths for character strings, and which might be used as:

if (sscanf_s(buffer, "%s %d", string, sizeof(string), &n) != 2)
    ...oops...

(Thanks to R.. for reminding me of this theoretical option — theoretically because only Microsoft has implemented the 'safe' functions, and they did not implement them exactly as the standard requires.)

Note that §K.3.3 Common definitions <stddef.h> says: '... The type is rsize_t which is the type size_t.385)' (and footnote 385 says: 'See the description of the RSIZE_MAX macro in <stdint.h>.' That means that in fact you can pass size_t without needing a cast — as long as the value passed is within the range defined by RSIZE_MAX in <stdint.h>. (The general intention is that RSIZE_MAX is a largish number but smaller than SIZE_MAX. For more details, read the 2011 standard, or get TR 24731 from the Open Standards web site.)

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • @Jonathan Leffler and @R.., thank you both for the discussion on `sscanf_s()`. It _is_ surprising to see the size parameter is not type `size_t` (MS says `unsigned`) and in reverse order (IMO). – chux - Reinstate Monica Aug 21 '13 at 23:40
  • 1
    The GNU libc implementation allows the format modifier `m` on `s`, `[` and `c` formats, which requires a char** argument and stores in it a pointer to a malloc'd string (which must be free'd by the caller). According to the manpage, that will be in a "forthcoming Posix standard"; I know nothing of that, but the feature is pretty cool. – rici Aug 21 '13 at 23:48
  • 1
    @rici: It's in the current POSIX standard (2008). – R.. GitHub STOP HELPING ICE Aug 22 '13 at 00:20
  • @R..: So it is, marked as `CX`. The trouble with reading standards is that you really can't skim them :) Anyway, I think it's better than the `sscanf_s` thing, but then I'm not scared of `malloc`. – rici Aug 22 '13 at 01:00
  • Automatic allocation is nice if you want to be lazy, but it's harder to recover from errors, and it allows malicious data to cause your program to run out of memory (in the case of `fscanf`, not `sscanf`). – R.. GitHub STOP HELPING ICE Aug 22 '13 at 01:49
  • As noted by @rici (and R..), the memory allocation is in the POSIX 2013 version of [`fscanf()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/fscanf.html) et al; I don't know if it was in the POSIX 2008 version any more — I'd have to use the Wayback Machine or something to find an old version of the page linked to. It is marked as CX (an extension over standard C). I'd managed to miss that in my occasional scanning of that manual page. – Jonathan Leffler Aug 22 '13 at 03:28
  • @JonathanLeffler: technically, that's the 2013 edition of Posix 2008.1 (which is to say, Posix 2008.1 plus a Technical Corrigendum.) According to the history on the page you link to, `m` was added in Issue 7: "SD5-XSH-ERN-132 is applied, adding the assignment-allocation character 'm'." (see http://www.opengroup.org/austin/docs/austin_sd5.txt) So it's been in Posix since 2008, even if neither of us noticed it until now. Goes to show you. – rici Aug 22 '13 at 05:01
  • @rici: that makes sense and is basically what I expected, but didn't have time to go and demonstrate. – Jonathan Leffler Aug 22 '13 at 05:05
  • @JonathanLeffler The statement "there's no need for an explicit length as you aren't going to write to it" is only true if you assume the first pointer addresses a null-terminated string. `snscanf` _would_ be helpful for cases where one does not have a null-terminated string. – Chris Leishman Jul 04 '16 at 22:16
  • @ChrisLeishman: Strings have null terminators — see the C standard, _§7.1.1 Definitions of terms ¶1 A_ string _is a contiguous sequence of characters terminated by and including the first null character._ And `sscanf()` only deals with parsing strings. However, there could perhaps be some occasions when you buffers that aren't null-terminated strings (perhaps data read from sockets or pipes), but you shouldn't go passing those to functions that expect strings. It hasn't ever been a problem for me; there's always been space enough in my buffers to add a terminal null. – Jonathan Leffler Jul 04 '16 at 22:30
  • It's actually quite common to parse a contiguous sequence of characters and want to stop at an arbitrary length rather than at a null (not always because of the lack of a null, but perhaps because it's a substring, e.g. a date within a serialized json string). There's many string related functions that have `n` variants, although most are not standardized (`strncpy`, `strndup`, `strnstr`, `mbsnrtowcs`, etc). Anyway, `snscanf` doesn't exist, even in variants, probably because nobody felt it was important enough to add. – Chris Leishman Jul 05 '16 at 02:27
  • *and they did not implement them exactly as the standard requires*, [typical from MS](https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguish) – Matthieu Dec 13 '17 at 13:36
3

In sscanf(s, format, ...), the the array of characters scanned is a const char *. There is no writing to s. The scanning stops when s[i] is NUL. Little need for an n parameter as an auxiliary limit to the scan.

In sprintf(s, format, ...), the array s is a destination. snprintf(s, n, format, ...) insures that data is not wriiten to s[n] and beyond.


What would be useful is a flag extension to sscanf() conversion specifiers so a limit could easily specified at compile time. (It can be done in a cumbersome fashion today, below, with a dynamic format or with sscanf(src,"%4s",buf1).)

// This is a proposed idea for C. - Not valid code today.
sscanf(src, "%!s", sizeof(buf1), buf)

Here ! would tell sscanf() to read a size_t variable for the size limit the upcoming string. Maybe in C17?


Cumbersome method that works today.

char * src = "helloeveryone";
char buf1[5];
char format[1+20+1+1];
sprintf(format, "%%" "%zu" "s", sizeof(buf1) - 1);
sscanf(src, format, buf1);
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
3

a little more wrinkles. the 'n' usually refers to the first argument in the snprintf. Now, it is true that the first string argument in sscanf is not written to. However, it is read. Thus, the following could segfault:

char s[2];
s[0]='1'; s[1]='3';
int x;
sscanf(s, "%d", &x);

because stepping one char beyond s could inadvertently step into reading from undefined memory (or continue the integer from another variable). so, something like this would be useful:

 snscanf(s, 2, "%d", &x);

s is not a string, of course, but it is a character array. the 'n' in the snscanf would prevent overstepping (reading from) the first (source string) argument, and not be related to the destination argument.

the way to avoid this is to first make sure that s is terminated by a '\0' within 2 characters. you can't use strlen, of course. you need strnlen, and a test whether it is less than 2. if it is 2, then more copying effort is needed first.

ivo Welch
  • 2,427
  • 2
  • 23
  • 31
  • Since you have to do all that work anyway it is much easier to just put a zero terminator on the string you are generating. Such as doing `s[1] = 0;` – Zan Lynx Feb 11 '15 at 00:05
  • 2
    you can't put s[1]='\0', because it loses '3'. besides, I was giving it as an illustration. my s is not the format string. it is the source string. the format in my example is known "%d". – ivo Welch Feb 11 '15 at 04:24
  • You still know how big the buffer is supposed to be. You can put a zero there. Better a lost character than a overflow. – Zan Lynx Feb 11 '15 at 07:03
  • @ZanLynx Null-terminated strings are dangerous and should be considered deprecated. – aberaud May 23 '23 at 19:37
1

Why don't you try fgets() (with the standard input file stdin)?

fgets() lets you to specify the maximum size for your buffer.

(In all what follows, I'll be using standard ISO C99 compatible syntax.)

Thus, you can write this code:

#include <stdio.h>
#define MAXBUFF 20 /* Small just for testing... */
int main(void) {
  char buffer[MAXBUFF+1]; /* Add 1 byte since fgets() inserts '\0' at end */
  fgets(buffer, MAXBUFF+1, stdin);
  printf("Your input was: %s\n", buffer);
  return 0;
}

fgets() reads at most MAXBUFF characters from stdin,
which is the standard input (that means: the keyboard).
The result is held in the array buffer.
If a '\n' character is found, the reading stops and '\n' is also held in buffer (as the last character). In addition, always a '\0' is added at the end of buffer, so enough storage is needed.
You can use a combination of fgets() followed by sscanf() in order to process the string:

  char buffer[MAXBUFF+1];
  fgets(buffer, MAXBUFF+1, stdin); /* Plain read */
  int x; float f;
  sscanf(buffer, "%d %g", &x, &f); /* Specialized read */

Thus, you have a "safe" scanf()-like method.

Note: This approach has a potencial problem. If fgets() reachs MAXBUFF characters before the end-of-line character '\n' is obtained, the rest of the input will not be discarded, and it will be taken as part of the next keyboard reading.
Hence, one has to add a flush mechanism, that actually is very simple:

while(getchar()!'\n') 
    ; /* Flushing stdin... */

However: If you just add that last piece of code after the fgets() line,
the user will be forced two press ENTER two times each time (s)he enters less than MAXBUFF characters. Worst: this is the most typical situation!

To fix this new problem, observe that an easy logical condition completeley equivalent to the fact that the character '\n' was not reached, is the following:

(buffer[MAXBUFF - 1] != '\0') && (buffer[MAXBUFF - 1] != '\n')

(Prove it!)

Thus, we write:

fgets(buffer, maxb+1, stdin);
if ((buffer[MAXBUFF - 1] != '\0') && (buffer[MAXBUFF - 1] != '\n'))
     while(getchar() != '\n')
       ;

A final touch is needed: since the array buffer could have garbadge,
it seems that some kind of initialization is needed.
However, let us observe that only the position [MAXBUFF - 1] has to be cleaned:

char buffer[MAXBUFF + 1] = { [MAXBUFF - 1] = '\0' }; /* ISO C99 syntax */

Finally, we can gather all that facts in a quick macro, like this program shows:

#include <stdio.h>
#define safe_scanf(fmt, maxb, ...) { \
    char buffer[maxb+1] = { [maxb - 1] = '\0' }; \
    fgets(buffer, maxb+1, stdin); \
    if ((buffer[maxb - 1] != '\0') && (buffer[maxb - 1] != '\n')) \
        while(getchar() != '\n') \
           ; \
    sscanf(buffer, fmt, __VA_ARGS__); \
  }
#define MAXBUFF 20     

int main(void) {
  int x; float f;      
  safe_scanf("%d %g", MAXBUFF+1, &x, &f);
  printf("Your input was: x == %d\t\t f == %g",  x, f);
  return 0;
}

It has been used the mechanism of variable number of parameters in a macro,
under the ISO C99 norms: Variadic macros
__VA_ARGS__ replaces the variable list of parameters.
(We need variable number of parameters in order to mimic the scanf()-like behaviour.)

Notes: The macro-body was enclosed inside a block with { }. This is not completely satisfactory, and it is easily improved, but it is part of another topic...
In particular, the macro safe_scanf() does not "return" a value (it is not an expression, but a block statement).

Remark: Inside the macro I have declared an array buffer which is created at the time of entering the block, and then is destroyed when the block is exited. The scope of buffer is limited to the block of the macro.

pablo1977
  • 4,281
  • 1
  • 15
  • 41
0

How to use sscanf correctly and safely

Note that fnprintf is not alone, and most array functions have a secure variation.

Community
  • 1
  • 1
TomF
  • 183
  • 11