21

I have vague memories of suggestions that sscanf was bad. I know it won't overflow buffers if I use the field width specifier, so is my memory just playing tricks with me?

Heisenbug
  • 38,762
  • 28
  • 132
  • 190
nmichaels
  • 49,466
  • 12
  • 107
  • 135

6 Answers6

11

I think it depends on how you're using it: If you're scanning for something like int, it's fine. If you're scanning for a string, it's not (unless there was a width field I'm forgetting?).


Edit:

It's not always safe for scanning strings.

If your buffer size is a constant, then you can certainly specify it as something like %20s. But if it's not a constant, you need to specify it in the format string, and you'd need to do:

char format[80]; //Make sure this is big enough... kinda painful
sprintf(format, "%%%ds", cchBuffer - 1); //Don't miss the percent signs and - 1!
sscanf(format, input); //Good luck

which is possible but very easy to get wrong, like I did in my previous edit (forgot to take care of the null-terminator). You might even overflow the format string buffer.

user541686
  • 205,094
  • 128
  • 528
  • 886
  • could you explain why is not secure fore string? – Heisenbug May 03 '11 at 17:43
  • @0verbose: Try: `char buffer[2]; sscanf("Oops!", "%s", &buffer);` – user541686 May 03 '11 at 17:45
  • @Mehrdad: yes of course..but here you aren't specifying string width – Heisenbug May 03 '11 at 17:47
  • @Mehrdad: if you use the correct width specifier for your buffer you don't risk overflowing the string. By the way, that ampersand shouldn't be there. – Matteo Italia May 03 '11 at 17:47
  • 2
    @0verbose: I think the issue is that you *can't* always specify the string width, because it's not always a constant. If it's a variable, you have to construct the format string through something like `sprintf`, which is a pain that most people don't want to go through. – user541686 May 03 '11 at 17:49
  • @Matteo: The ampersand is fine. – user541686 May 03 '11 at 17:49
  • @nmichaels: Do those work for the `scan` functions too, or just the `print` functions? – user541686 May 03 '11 at 17:51
  • @Mehrdad: string width is the maximum string width.. not the precise value of the input string. – Heisenbug May 03 '11 at 17:53
  • @0verbose: Not sure, what do you mean? – user541686 May 03 '11 at 17:54
  • @Mehrdad: Now I'm not sure. It looks like it does assignment suppression in the `scan` family. – nmichaels May 03 '11 at 17:55
  • @Mehrdad: I mean that if you specify the width it has to be considered as the maximum readable string length. So if your buffer is 200 bytes, and you specify 200 as width. You are sure that you'll never read more than 200 bytes. So no buffer oveflow. – Heisenbug May 03 '11 at 17:55
  • @nmichaels: Yeah exactly, there's no way to do that. So if the buffer size isn't constant, it's a big pain. (I don't know if `sscanf_s` is any better, though...) – user541686 May 03 '11 at 17:56
  • @Mehrdad: yup, you're right, since `buffer` is an array; I was implicitly thinking about a pointer variable, in that case this would have caused problems. However, for the sake of consistency I prefer never to put that ampersand and just let the array decay to pointer. – Matteo Italia May 03 '11 at 17:56
  • @0verbose: Well, if your buffer is 200 bytes, you specify 199 was the width, not 200! But the question is, what if you don't know how big the buffer was at compile-time? Then what do you do? – user541686 May 03 '11 at 17:56
  • @Matteo: Hm okay. I tried to be more consistent with the fact that you'd have `&myInt`, `&myFloat`, etc., so I did `&myString`. :) – user541686 May 03 '11 at 17:57
  • The problem is that the `*` does what we would like here only for `printf`, but for `scanf` it has another meaning. – Matteo Italia May 03 '11 at 17:58
  • @Mehrdad: I think you'd have to use the `a` flag and copy with `memcpy` or `strncpy` or something then free. – nmichaels May 03 '11 at 17:58
  • @nmichaels: What does the `a` flag do? It's not supported by the MS CRT, so I don't know what it does but it certainly won't be very portable! – user541686 May 03 '11 at 17:59
  • @Mehrdad: right. I have to agree with you.. but the question was only the possibility to use sscanf in a secure way. Achieve strong security is hard subject, but I don't think this was the initial question. Anyway..I'm pleased to discuss about that. So, could you make me an example showing a situations in which is necessary not to have the buffer dimension established at compile-time? – Heisenbug May 03 '11 at 18:00
  • @Mehrdad: It allocates a buffer. Odd; I thought it was part of the standard but it's a GNU extension. My man page says `C99 employs the 'a' character as a conversion specifier`. – nmichaels May 03 '11 at 18:02
  • @0verbose: Example? `void fillBuffer(int arg1, char *string, int length) { scanf("%d%s", &arg1, string); } //Now what do you do with length?` It's *possible* but like I said, it's very painful. :[ – user541686 May 03 '11 at 18:03
  • @Mehrdad: maybe I explained in a bad way what I wanted to say. But the point is: you are writing the code. If you want to achieve security goals against buffer overflows you should pay attention of your buffers size. – Heisenbug May 03 '11 at 18:06
  • @0verbose: What I was saying is that it's so annoying to do that correctly for a string, that you're *bound* to go wrong somewhere (see my edit)... hence why it's unsafe. – user541686 May 03 '11 at 18:07
  • @Mehrdad: now I got you. You are right. So what solution do you propose? – Heisenbug May 03 '11 at 18:10
  • @0verbose: I can't think of any awesome solutions... either use `scanf_s` if you're using Microsoft's CRT, or just... be careful? Haha I can't think of any great portable solutions for this... – user541686 May 03 '11 at 18:12
  • @Mehrdad: or one can simply put an upper limit to bufferSize value. Sounds reasonable? – Heisenbug May 03 '11 at 18:17
  • @0verbose: No, unfortunately that's not reasonable, because for all you know, it could be 10 MB big. :( Furthermore, if you have no idea how big your buffer is, it could be *smaller* than your maximum value... how would you take care of that? Allocate another buffer and copy over the truncated results? It's too much pain. :[ – user541686 May 03 '11 at 18:26
  • @Mehrdad: this depends on the logic of you program. You can also do multiple call do sscanf if your goal is security. – Heisenbug May 03 '11 at 18:30
  • @0verbose: Not sure what you mean by multiple calls? If your input is sandwiched between two `int`s, for instance, how can you use multiple calls? – user541686 May 03 '11 at 18:44
  • @Mehrdad: I'm not sure what you mean by sandwiched between to int . :) – Heisenbug May 03 '11 at 18:46
  • @0verbose: If your format string is `"%d%s%d"`, how exactly would you do "multiple calls" to read the `%s`? – user541686 May 03 '11 at 19:08
  • @Mehrdad: sorry but here it's becoming night, I really need to sleep now. Tomorrow I'll post some code. Meanwhile, have a look at @R.. answer. What do you think about that? – Heisenbug May 03 '11 at 20:41
  • @0verbose: I looked at it and the issue is that, while it's quite *possible* to use it safely, it's *hard*. And programmers are lazy, so it generally isn't good to make tasks hard for them as they will mess up. And haha ok no problem, good night! – user541686 May 03 '11 at 20:50
4

All of the scanf functions have fundamental design flaws, only some of which could be fixed. They should not be used in production code.

  • Numeric conversion has full-on demons-fly-out-of-your-nose undefined behavior if a value overflows the representable range of the variable you're storing the value in. I am not making this up. The C library is allowed to crash your program just because somebody typed too many input digits. Even if it doesn't crash, it's not obliged to do anything sensible. There is no workaround.

  • As pointed out in several other answers, %s is just as dangerous as the infamous gets. It's possible to avoid this by using either the 'm' modifier, or a field width, but you have to remember to do that for every single text field you want to convert, and you have to wire the field widths into the format string -- you can't pass sizeof(buff) as an argument.

  • If the input does not exactly match the format string, sscanf doesn't tell you how many characters into the input buffer it got before it gave up. This means the only practical error-recovery policy is to discard the entire input buffer. This can be OK if you are processing a file that's a simple linear array of records of some sort (e.g. with a CSV file, "skip the malformed line and go on to the next one" is a sensible error recovery policy), but if the input has any more structure than that, you're hosed.

In C, parse jobs that aren't complicated enough to justify using lex and yacc are generally best done either with POSIX regexps (regex.h) or with hand-rolled string parsing. The strto* numeric conversion functions do have well-specified and useful behavior on overflow and do tell you how may characters of input they consumed, and string.h has lots of handy functions for hand-rolled parsers (strchr, strcspn, strsep, etc).

zwol
  • 135,547
  • 38
  • 252
  • 361
4

Yes it is..if you specify the string width so the are no buffer overflow related problems.

Anyway, like @Mehrdad showed us, there will be possible problems if the buffer size isn't established at compile-time. I suppose that put a limit to the length of a string that can be supplied to sscanf, could eliminate the problem.

Heisenbug
  • 38,762
  • 28
  • 132
  • 190
  • 2
    This might sound silly but why is there `sscanf_s` in Microsoft's CRT? – user541686 May 03 '11 at 17:40
  • @Mehrdad: I don't know. Anyway if you are checking the size of the input you shouldn't have buffer overflow problems. – Heisenbug May 03 '11 at 17:42
  • @Mehrdad According to them, they have more secure versions than the stnadard ones. Check [here](http://msdn.microsoft.com/en-us/library/zkx076cy.aspx), first phrase. – CRM May 03 '11 at 17:46
  • but sscanf doesnt accept string width explicitly, its has to be in the format string. right? – z33m May 03 '11 at 17:47
4

The reason why sscanf might be considered bad is because it doesnt require you to specify maximum string width for string arguments, which could result in overflows if the input read from the source string is longer. so the precise answer is: it is safe if you specify widths properly in the format string otherwise not.

z33m
  • 5,993
  • 1
  • 31
  • 42
4

Note that as long as your buffers are at least as long as strlen(input_string)+1, there is no way the %s or %[ specifiers can overflow. You can also use field widths in the specifiers if you want to enforce stricter limits, or you can use %*s and %*[ to suppress assignment and instead use %n before and after to get the offsets in the original string, and then use those to read the resulting sub-string in-place from the input string.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
3

There is 2 point to take care.

The output buffer[s].

As mention by others if you specify a size smaller or equals to the output buffer size in the format string you are safe.

The input buffer.

Here you need to make sure that it is a null terminate string or that you will not read more than the input buffer size.

If the input string is not null terminated sscanf may read past the boundary of the buffer and crash if the memorie is not allocated.

mathk
  • 7,973
  • 6
  • 45
  • 74
  • Conceptually, there's no such thing in C as "non-null terminated string". C++ does have string that may or may not be null-terminated, but in C a null character is inseparable from a string, because this is the only information that allows you to figure out its length. – Hi-Angel Jun 09 '23 at 08:05