1

I conjectured that ifstream would be faster than fscanf because fscanf has to parse the format string every time it runs, whereas, with ifstream, we know at compile time what kind of "thing" we want to read.

But when I ran this quick and dirty benchmark

#include <ctime>
#include <cstdio>
#include <iostream>
#include <fstream>
using namespace std;

#define NUMBER_OF_NUMBERS 100000000

int nums[NUMBER_OF_NUMBERS];

int main(int argc, char** argv) {

    FILE * fp = fopen("nums.txt","r");
    auto start = clock();
    for (int i = 0; i < NUMBER_OF_NUMBERS; i++)
        fscanf(fp,"%d",nums+i);
    auto end = clock();
    fclose(fp);

    auto first = end - start;

    ifstream fin("nums.txt");
    start = clock();
    for (int i = 0; i < NUMBER_OF_NUMBERS; i++)
        fin >> nums[i];
    end = clock();
    fin.close();

    auto second = end - start;

    cout << "CLOCKS_PER_SEC : " << CLOCKS_PER_SEC << endl;
    cout << "first          : " << first << endl;
    cout << "first (sec)    : " << first / CLOCKS_PER_SEC << " seconds" << endl;
    cout << "second         : " << second << endl;
    cout << "second (sec)   : " << second / CLOCKS_PER_SEC << " seconds" << endl;
    cout << "diff           : " << second - first << endl;
    cout << "diff (sec)     : " << (second - first) / CLOCKS_PER_SEC << " seconds" << endl;

    return 0;
}

I got as output the following:

CLOCKS_PER_SEC : 1000000
first          : 12336249
first (sec)    : 12 seconds
second         : 25738587
second (sec)   : 25 seconds
diff           : 13402338
diff (sec)     : 13 seconds

ifstream is more than twice as slow as fscanf. Where does fscanf get all this speed?

EDIT:

I'm on a reasonably modern 64-bit intel mac, using command line tools that come with xcode, in case it is relevant at all.

Deanie
  • 2,316
  • 2
  • 19
  • 35
math4tots
  • 8,540
  • 14
  • 58
  • 95
  • Are you running an optimized build? \ – PaulMcKenzie Apr 28 '14 at 00:24
  • @PaulMcKenzie `g++ -Wall --std=c++11 test.cpp && ./a.out` is what i did originally, but I just tried again with `g++ -O3 -Wall --std=c++11 test.cpp && ./a.out` with similar results – math4tots Apr 28 '14 at 00:24
  • Short answer: http://stackoverflow.com/questions/5166263/how-to-get-iostream-to-perform-better – jrd1 Apr 28 '14 at 00:25
  • @GregHewgill At least from a cursory look at that post, the accepted answer talks about console input. I would have imagined that the argument that "console speeds don't really matter" might not exactly apply when implementing file io. At the very least, I'm pretty sure `fscanf` above isn't using the `stdin` device – math4tots Apr 28 '14 at 00:45
  • With constant formats, I've come across compilers that analyze the format at _compile_ time. The basic `fscanf(fp,"%d",nums+i)` could be simplified then into some sort of `nums[i] = getint(fp)`. Not saying that is what happened here - would need to inspect the assembly. – chux - Reinstate Monica Apr 28 '14 at 04:20
  • 1
    The marked duplicate has almost nothing to do with this question, since the other post is about the standard streams, rather than `fstream`s. Nor is it the best post about the standard streams. –  Jun 12 '16 at 17:32
  • What happens if you swap the order of the two tests? –  Jun 12 '16 at 17:56

1 Answers1

-2

fscanf does not need to parse the format string at all. It uses a greedy algorithm that looks for the '%' character and then uses a simple switch statement to generate the input. ifstream on the other hand, needs to perform lookups on its vtable to determine how each minute detail of the input is used.

All this being said, fscanf cannot be extended without altering the C library, while ifstream is as easy as sub-classing it.

EDIT: All of this is also part of the fact that C's fscanf library routine has had far more time/work put into it and more opportunities for optimization. The C++ library is only arbitrary consequence of the C++ standard and thus has not had the same scrutiny applied to it.

randomusername
  • 7,927
  • 23
  • 50
  • 3
    `It uses a greedy algorithm that looks for the '%' character and then uses a simple switch statement to generate the input.` If that's not "parsing" then I don't know what is. And I don't see where a virtual table comes into it; streams deal with their insertion operands through basic function overloading. – Lightness Races in Orbit Apr 28 '14 at 00:31
  • Does `ifstream` still need to consult its vtable even if I have instantiated on the stack? I thought virtual methods were only relevant when I was accessing through pointers, where the exact underlying type is unknown at compile time – math4tots Apr 28 '14 at 00:31
  • Not the vtable of the things being printed, but the `ifstream`'s own vtable. Because `ifstream` can be sub-classed (and thus a different set of vtable entries) it has to perform the lookup. – randomusername Apr 28 '14 at 00:35
  • @randomusername: That's.. nonsense. This is implementation-defined but at least in libstdc++ `ifstream` isn't even polymorphic: it doesn't _have_ a virtual table (though its member `basic_filebuf` instantiation does). [Here's the source if you don't believe me](http://gcc.gnu.org/onlinedocs/gcc-4.6.2/libstdc++/api/a00881_source.html) (and this is why typical advice is _not_ to derive from standard containers or stream types) – Lightness Races in Orbit Apr 28 '14 at 00:40
  • @LightnessRacesinOrbit Please, sir high and mighty, grace us with the true answer then... – randomusername Apr 28 '14 at 00:41
  • 4
    @randomusername: Just because you are wrong is not a good reason to call me names – Lightness Races in Orbit Apr 28 '14 at 00:42