3

As we know, it's a good idea to check scanf for errors like this:

if(scanf("%d %d %d", &x, &y, &z) != 3) {
    /* Handle error */
}

But I wonder if there is any way to automatically detect that it should be 3.

One approach I have thought about is to declare the format string separately and then parse it. Something like this:

const char format[] = "%d %d %d";
size_t n = count(format);
if(scanf(format, &x, &y, &z) != n) {

But I have no idea how to implement count properly. I could do something like counting number of % but that would be very error prone. If there's no library function for this, I suspect it would be incredibly hard to get it right.

Another approach I have considered is doing something like this:

void wrapper(const char *format, ...) {
    va_list arg;
    va_start(arg, format);
    size_t n = count(arg);
    int done = __vfscanf_internal (stdin, format, arg, 0);
    if(done != n) {
    

But as far as I can see, there's no way to write the function count here. The specs of va_arg says: "If va_arg is called when there are no more arguments in ap, the behavior is undefined." so I cannot loop it until NULL or something like that.

A third approach is to see if scanf supports writing this number to a variable like this:

int n = scanf("%<somespecifier>%d %d %d", %c, %x, %y, %z);
if(n != c) {
    /* Handle error */
}

But that does not seem to be an option.

So I am at a loss here. Is there any way to do what I want?

My ultimate goal with this is to write a "safe" (I know that's a relative term when it comes to this) version of scanf that exits on failure. Something like

void safe_scanf(const char *format, ...) {
    /* Code */
    if(<wrong number of assignments>) {
        perror("Wrong number of assignments");
        exit(EXIT_FAILURE);
klutt
  • 30,332
  • 17
  • 55
  • 95
  • 6
    is there really a point? Format strings are usually written during compile time - so you always know how many format specifiers there are. When you truly need to deduce that dynamically, I'd argue parsing the format string is the only way - though that will add extra overhead – Chase Oct 27 '20 at 15:11
  • @Chase I want to do this for essentially the same reason that I write `int *p = malloc(sizeof *p)` instead of `int *p = malloc(sizeof int)` – klutt Oct 27 '20 at 15:13
  • 1
    If program safety is your concern then you shouldn't be using the scanf family of functions to begin with. stdio.h in general contains some of the most horrid and error-prone APIs ever written – Lundin Oct 27 '20 at 15:15
  • 3
    There is no way of doing this, other than defining a bunch of hideous "recursive" preprocessor macros, which I would totally advise you against. There is no practical advantage in doing this. There might be in `malloc(sizeof *p)` since the definition of `p` could be hundreds of lines away, but in `scanf(...) == n` the number of arguments passed is *right there*. – Marco Bonelli Oct 27 '20 at 15:17
  • @Lundin I'm aware of that. – klutt Oct 27 '20 at 15:17
  • 1
    I have done it, not very robustly, but I think I simply counted the number of % signs in the format string and compared it to the result. – Sven Nilsson Oct 27 '20 at 15:18
  • 1
    The format string is defined in C 2018 7.21.6.2. Each `%` introduces a conversion specification and should be followed by an optional `*`, an optional decimal integer greater than zero, an optional length modifier, and a conversion specifier. The length modifier is `hh`, `h`, `l`, `ll`, `j`, `z`, `t`, or `L`. The conversion specifier is one of `diouxaefgcspnAEFGX%` or is `[` followed by characters up to the first `]` or the second `]` if those characters start with `]` or `^]`. If it has the `*`, do not count it for an assignment to be performed. If it has the `%` specifier, do not count it. – Eric Postpischil Oct 27 '20 at 15:33
  • The above enables a `count` function that returns the count that should equal the return value of `scanf` if all items were successfully read and converted. Note that this fails to detect literal matches after conversions. E.g., in `%d+foo`, the count will be 1 if `%d` was matched whether or not the matching characters were followed by “+foo”. Similarly `%d %*d` has the same return value (1) regardless of whether the `%*d` is matched or not. Matches or non-matches can be distinguished using `%n` to check character counts, as with `%d%n+foo%n` and `%d %n%*d%n`. – Eric Postpischil Oct 27 '20 at 15:36
  • The C standard does not provide the facilities needed to implement `safe_scanf`, as there is no way for a called routine to know what variable arguments it has actually been passed (versus what it has been told, via the format string, what has been passed). So that goal is impossible in strictly conforming C. – Eric Postpischil Oct 27 '20 at 15:40
  • I wonder if [gnu's template string parsers](https://www.gnu.org/software/libc/manual/html_node/Parsing-a-Template-String.html) could help. – Chase Oct 27 '20 at 15:59
  • Have you tried to reference this answer? https://stackoverflow.com/questions/205529/passing-variable-number-of-arguments-around?rq=1 – Maurizio Benedetti Oct 27 '20 at 16:03
  • 1
    And to add to what @EricPostpischil said: if the conversion specification is `%n`, do not count that, either. (See C11 [§7.21.6.2 The `fscanf` function ¶12](http://port70.net/~nsz/c/c11/n1570.html#7.21.6.2p12) for the standard definition.) – Jonathan Leffler Oct 27 '20 at 16:43
  • For those who are interested, I have posted some code for review here: https://software.codidact.com/questions/278837 – klutt Oct 27 '20 at 19:08
  • Thanks Eric and Jonathan. It helped a lot. – klutt Oct 27 '20 at 19:09
  • A compiler like GCC provides a bunch of command lines options that check the formats and parameters coherency of scanf(), printf()... (-Wformat and Co). – Rachid K. Oct 28 '20 at 16:38

2 Answers2

2

Did you consider a macro? They support variable arguments in a way that sometimes make them more useful than regular functions.

The code below is not tested, but can maybe get you started.

#define safe_sscanf(fmt, ...) {\
const char *p = fmt-1; int n=0;\
while (p=strstr(p+1, "%")) ++n;\
p = fmt-2;\
while (p=strstr(p+2, "%%")) n-=2;\
if (sscanf(fmt, __VA_ARGS__) != std::max(0,n)) exit(1);\
}
Sven Nilsson
  • 1,861
  • 10
  • 11
  • Is `std:max` part of C++, or is that a typo for `std::max`? It isn't appropriate for a question tagged C, either way. Your parsing for `%` symbols is adequate for many practical uses, but is definitely not fully generalized. – Jonathan Leffler Oct 27 '20 at 17:11
  • Thanks. Maybe I'll consider having a look at that in combination with parsing the format string. I have already started on the latter, and it seems (could be wrong though) that it was much easier than I thought. You can have a look here if you want: https://software.codidact.com/questions/278837 – klutt Oct 27 '20 at 22:19
  • Oh, btw. If `fmt` points to the first element (very likely) then `char *p = fmt-1` Will invoke undefined behavior. And yes, it will, even if you don't dereference. – klutt Oct 28 '20 at 01:31
  • Source: https://stackoverflow.com/a/60163919/6699433 – klutt Oct 28 '20 at 01:37
  • Thanks for this information, I guess they restricted the pointer freedom for the compiler to optimize harder in the C17 standard. I'm thinking this code works with many compilers with older standard though. My thinking is that a pointer is a memory address, and can therefore point *anywhere*. Sorry about std:min typo, fixed. – Sven Nilsson Oct 28 '20 at 15:22
  • @SvenNilsson It has been UB quite a long time. But as you know, a common symptom of UB is "working as it should" :) – klutt Oct 28 '20 at 15:43
  • @SvenNilsson You're allowed to point one element past. That's mainly to allow things like `while(*str) { length++; str++; }` – klutt Oct 28 '20 at 15:46
  • Maybe a volatile pointer solves the undefined problem, const char* volatile. This prevents many compiler optimizations from happening. – Sven Nilsson Oct 28 '20 at 15:52
  • @SvenNilsson It may solve symptoms, but not the underlaying problem. It's still UB. – klutt Oct 28 '20 at 16:01
  • Thanks, you are probably right, but if the string physically exists in memory then I'm pretty sure the volatile approach will work. The problem is that the optimizer may choose not to place the string in memory at all, i.e. it keeps only the results of the operations performed on the string. – Sven Nilsson Oct 28 '20 at 16:11
  • The compiler is free to assume that UB will never happen, so it may completely remove the whole function call – klutt Oct 28 '20 at 16:54
2

You can use this variadic macro trick to count the number of arguments passed to a function:

#define VA_NUM_ARGS(...) VA_NUM_ARGS_IMPL(__VA_ARGS__, 5,4,3,2,1)
#define VA_NUM_ARGS_IMPL(_1,_2,_3,_4,_5,N,...) N

This implementation works up to 5 arguments, but can be easily extended. So, you can wrap scanf into a wrapper like:

#define scanf_checked(...) scanf(__VA_ARGS__) - VA_NUM_ARGS(__VA_ARGS__) + 1

with the trailing + 1 needed to remove the format non-variadic argument from the count.

Assuming that you are passing the correct number of arguments (most compilers have an appropriate warning for it), you have to check if the return value is zero. This example uses a sscanf_checked, with + 2 because there is one more non-variadic argument:

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

#define VA_NUM_ARGS(...) VA_NUM_ARGS_IMPL(__VA_ARGS__, 5,4,3,2,1)
#define VA_NUM_ARGS_IMPL(_1,_2,_3,_4,_5,N,...) N

#define scanf_checked(...) scanf(__VA_ARGS__) - VA_NUM_ARGS(__VA_ARGS__) + 1
#define sscanf_checked(...) sscanf(__VA_ARGS__) - VA_NUM_ARGS(__VA_ARGS__) + 2

int main() {

    unsigned u1, u2;

    int i;

    i = sscanf_checked("1 2", "%u %u", &u1, &u2);
    assert(i == 0);

    i = sscanf_checked("1", "%u %u", &u1, &u2);
    assert(i != 0);

    i = sscanf_checked("1", "%u %u", &u1); // note: UB
    assert(i == 0); // assert may succeed, but compiler warns for too few arguments

    return EXIT_SUCCESS;

}

Working example here.

Giovanni Cerretani
  • 1,693
  • 1
  • 16
  • 30