Is it better to use one scanf() (or fscanf()) or it doesn't matter?

Question

I have heard that scanf() "costs a lot", but now I can't find any information about that. Therefore, does it matter whether I use one fscanf("%f", &num) or look for multiple values in one line (fscanf("%f %f %f %f", &num, &num1, &num2, &num3))? Additional question: could you recommend some sources for that type of information about how much a function can cost to the program?

I know there is scanf() in the title, but fscanf(), sscanf() and scanf() can be found on the same page of manual, therefore I believe the title isn't misleading.

@WilliamPursell, why do you claim so? I think that `scanf()` is useful unless you know how to handle it. — Funny, Aug 25 '21 at 19:41
It is extremely difficult to reliably handle input with scanf. Even with a simple `scanf("%f")`, the behavior will be undefined on input that exceeds the bounds of a float. To prevent that, you must add a maximum field width, but if you do that you greatly restrict the number of values that can be entered. It can be useful in toy programs, but it is not suitable for anything beyond simple exercises. — William Pursell, Aug 25 '21 at 20:58
In the context of the discussion on why not to use `scanf`, this guide may be helpful: [A beginners' guide away from scanf()](http://sekrit.de/webdocs/c/beginners-guide-away-from-scanf.html) — Andreas Wenzel, Aug 26 '21 at 06:20

score 5 · Answer 1 · answered Aug 25 '21 at 19:33

They should perform exactly the same.

fscanf reads from an open file stream; "File Scan Formated".

scanf does exactly the same thing, but it only reads from stdin. scanf(format, ...) is just fscanf(stdin, format, ...).

Additional question: could you recommend some sources for that type of information about how much a function can cost to the program?

This will depend on your implementation. The reality is functions like scanf are unlikely to be a performance issue. They have been optimized for decades. Performance problems come from things like loops and bad algorithms. Look into subjects such as Big-O Notation for the basics, and tools such as benchmarking and profilers to test running programs.

What you should concern yourself with is the many problems with scanf and fscanf. The big problem being that they tie together reading input with parsing it. Consider using fgets and sscanf (S for String) instead to separate the two: read a line, parse a line.

Why do you call reading and parsing input a problem? If we know what we should expect from the output, fscanf() shouldn't be a problem - if we additionally know how to handle this function — Funny, Aug 25 '21 at 20:08
@Funny Because they have different error handling. With fscanf it's more difficult to know if there was a read problem vs a parsing problem. And you ***never*** know what to expect. If you make assumptions about your input, your program is vulnerable to all sorts of attacks and errors. Reading and parsing are two complex problems; separate them. — Schwern, Aug 25 '21 at 20:10

Craig Estey · Answer 2 · 2021-08-26T02:29:27.427

I have heard that scanf() "costs a lot", but now I can't find any information about that.

Yes, that's true. It is slower in many cases.

In particular, reading in a file byte by byte:

int chr;

// this is much slower than ...
while (1) {
    if (fscanf(xf,"%c",&chr) != 1)
        break;
}

// ... this
while (1) {
    chr = fgetc(xf);
    if (chr == EOF)
        break;
}

Side note: Believe it or not, the above comes from a real world production grade program that I encountered [for loading firmware into an FPGA]. I fixed it by using fgetc [and read]. I reduced the running time from 15 minutes to 90 seconds

Additional question: could you recommend some sources for that type of information about how much a function can cost to the program?

You can learn about "big O" notation and analysis.

But, you can always write a benchmark program. Here's one I created that can help you with creating your own.

In this particular case, the important thing is to remove the overhead of the I/O from the timings. So, this program creates random lines in a buffer and does either strtod or sscanf on the buffer.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define TSTMAX      100000
#define FNCMAX      2

typedef long long tsc_t;

tsc_t
tscget(void)
{
    struct timespec ts;
    tsc_t tsc;

    clock_gettime(CLOCK_MONOTONIC,&ts);

    tsc = ts.tv_sec;
    tsc *= 1000000000;
    tsc += ts.tv_nsec;

    return tsc;
}

double
tscsec(tsc_t tsc)
{
    double sec;

    sec = tsc;
    sec /= 1e9;

    return sec;
}

tsc_t elap[FNCMAX][TSTMAX];

void
useit(double *arr)
{
}

void
doscanf1(char *buf)
{
    double arr[4];

    sscanf(buf,"%lf %lf %lf %lf",&arr[0],&arr[1],&arr[2],&arr[3]);

    useit(arr);
}

void
doscanf2(char *buf)
{
    double arr[4];

    sscanf(buf,"%lf %lf %lf %lf",&arr[0],&arr[1],&arr[2],&arr[3]);

    useit(arr);
}

void
dotok(char *buf)
{
    double arr[4];

    for (int idx = 0;  idx < 4;  ++idx)
        arr[idx] = strtod(buf,&buf);

    useit(arr);
}

void
dofnc(int tstidx,int fncidx,const char *src)
{
    char buf[1000];
    tsc_t tscbeg;
    tsc_t tscend;

    strcpy(buf,src);

    tscbeg = tscget();

    switch (fncidx) {
    case 0:
        dotok(buf);
        break;
    case 1:
        doscanf1(buf);
        break;
    case 2:
        doscanf2(buf);
        break;
    }

    tscend = tscget();
    tscend -= tscbeg;

    elap[fncidx][tstidx] = tscend;
}

void
dotest(int tstidx)
{
    char *bp;
    char buf[1000];

    bp = buf;
    for (int idx = 0;  idx < 4;  ++idx)
        bp += sprintf(bp," %.8g",drand48());

    dofnc(tstidx,0,buf);
    dofnc(tstidx,1,buf);
}

int
cmpfnc(const void *lhs,const void *rhs)
{
    tsc_t dif = *(const tsc_t *) lhs - *(const tsc_t *) rhs;
    int cmpflg;

    do {
        cmpflg = -1;
        if (dif < 0)
            break;

        cmpflg = 1;
        if (dif > 0)
            break;

        cmpflg = 0;
    } while (0);

    return cmpflg;
}

int
main(void)
{
    tsc_t avg[FNCMAX] = { 0 };

    for (int tstidx = 0;  tstidx < TSTMAX;  ++tstidx)
        dotest(tstidx);

    for (int fncidx = 0;  fncidx < FNCMAX;  ++fncidx)
        qsort(&elap[fncidx][0],TSTMAX,sizeof(tsc_t),cmpfnc);

    for (int tstidx = 0;  tstidx < TSTMAX;  ++tstidx) {
        for (int fncidx = 0;  fncidx < FNCMAX;  ++fncidx) {
            tsc_t tsc = elap[fncidx][tstidx];
            printf(" %.9f",tscsec(tsc));
            avg[fncidx] += tsc;
        }
        printf(" %d\n",tstidx);
    }

    for (int fncidx = 0;  fncidx < FNCMAX;  ++fncidx) {
        tsc_t tsc = avg[fncidx];
        printf("TOT:%.9f AVG:%.9f\n",tscsec(tsc),tscsec(tsc) / TSTMAX);
    }
}

You can run the above program. Here is the summary from a run on my system:

TOT:0.090690979 AVG:0.000000907
TOT:0.157146016 AVG:0.000001571

So, using strtod is almost 1.75x faster than sscanf

HOwever, isn't `fscanf()` easier to check for errors thatn `strtod()`? Anyway, I can always encounter the real 0.0 value in my file. Additionally, what does `useit()` do? — Funny, Aug 26 '21 at 04:23
`useit` was to suppress a superfluous compiler warning about "arr set but not used" — Craig Estey, Aug 26 '21 at 04:33
What is the point of passing `&buf` as a second argument to `strtod`? If you overwrite `buf`, you cannot compare it with the original value of `buf`, so you cannot check for a conversion error, so what is the point of passing in a second argument at all? Why not simply pass in `NULL` as the second argument if you don't want to check for a conversion error? — Andreas Wenzel, Aug 26 '21 at 06:27
@AndreasWenzel The `&buf` is to advance the pointer. Otherwise, the subsequent calls will reparse the _first_ token (e.g. for a buffer of `1 2 3 4`, all `arr[idx]` would be `1`). For a buffer of `1@ 2`, comparing against original `buf` pointer value would _not_ detect a conversion error. You can check for conversion error by looking at `*buf` after the call (e.g. it must be ` `, `\n`, or 0, etc.) — Craig Estey, Aug 26 '21 at 14:47
@CraigEstey: Ah, that is a nice and elegant way of advancing the pointer. Thanks for the explanation. — Andreas Wenzel, Aug 26 '21 at 15:53
@Funny One checks for an error in `strtod` not by checking the return value, but by seeing if the start and end pointers are the same. [Example](https://gist.github.com/schwern/80afadbad56855ad11dd69c3952cfb18). Obtuse, but once you know you know. That's C for you; it isn't easy, just possible. `fscanf`'s return value mixes up input and parsing errors making it difficult to tell what's happened and how to recover. — Schwern, Aug 26 '21 at 17:46

Is it better to use one scanf() (or fscanf()) or it doesn't matter?

2 Answers2