25

Possible Duplicate:
Fastest way to read numerical values from text file in C++ (double in this case)

#include <ctime>
#include <cstdlib>
#include <string>
#include <sstream>
#include <iostream>
#include <limits>

using namespace std;

static const double NAN_D = numeric_limits<double>::quiet_NaN();

void die(const char *msg, const char *info)
{
    cerr << "** error: " << msg << " \"" << info << '\"';
    exit(1);
}

double str2dou1(const string &str)
{
    if (str.empty() || str[0]=='?') return NAN_D;
    const char *c_str = str.c_str();
    char *err;
    double x = strtod(c_str, &err);
    if (*err != 0) die("unrecognized numeric data", c_str);
    return x;
}

static istringstream string_to_type_stream;

double str2dou2(const string &str)
{
    if (str.empty() || str[0]=='?') return NAN_D;
    string_to_type_stream.clear();
    string_to_type_stream.str(str);
    double x = 0.0;
    if ((string_to_type_stream >> x).fail())
        die("unrecognized numeric data", str.c_str());
    return x;
}

int main()
{
    string str("12345.6789");

    clock_t tStart, tEnd;

    cout << "strtod: ";
    tStart=clock();

    for (int i=0; i<1000000; ++i)
        double x = str2dou1(str);

    tEnd=clock();
    cout << tEnd-tStart << endl;

    cout << "sstream: ";
    tStart=clock();

    for (int i=0; i<1000000; ++i)
        double x = str2dou2(str);

    tEnd=clock();
    cout << tEnd-tStart << endl;

    return 0;
}

strtod: 405
sstream: 1389

update: remove undersocres, env: win7+vc10

Community
  • 1
  • 1
hjbreg
  • 263
  • 2
  • 4
  • 6
  • Try to use boost::spirit instead of. – Sergey Miryanov Apr 29 '11 at 10:30
  • 1
    Those double-underscore names are illegal in user-written code. and if stringstreams are too slow for you - you have the answer - use strtod. stringstreams are primarily there for convenience and type-safety, not speed. –  Apr 29 '11 at 10:47
  • The stream will collect the input and then eventually call strtold for the conversion. Makes it hard to be any faster! – Bo Persson Apr 29 '11 at 11:00
  • Which compiler is it? Maybe the [stlport](http://www.stlport.org/) implementation of STL would be faster than one with comes with it (do not expect to beat strtod though, it's not possible). – Jan Hudec Apr 29 '11 at 11:13
  • @unapersson double-underscore names were copyied from other place, lazy to modify them – hjbreg Apr 29 '11 at 11:19
  • @hjbreg: just because a function is slower than another does not explain why you think it is "too slow" - do you really need it to be faster? – Doc Brown Apr 29 '11 at 11:24
  • @Doc Brown I do, almost 100Mb raw data is to be converted – hjbreg Apr 29 '11 at 11:26
  • @hjbreg: which running time do you have now, which time do you try to achieve and which percent of the running time is spend in the function above? – Doc Brown Apr 29 '11 at 11:29
  • @Doc Brown converting will be performed once, so just leave it as it is, but I wonder if there is any better solutions – hjbreg Apr 29 '11 at 11:35

4 Answers4

12

C/C++ text to number formatting is very slow. Streams are horribly slow but even C number parsing is slow because it's quite difficult to get it correct down to the last precision bit.

In a production application where reading speed was important and where data was known to have at most three decimal digits and no scientific notation I got a vast improvement by hand-coding a floating parsing function handling only sign, integer part and any number of decimals (by "vast" I mean 10x faster compared to strtod).

If you don't need exponent and the precision of this function is enough this is the code of a parser similar to the one I wrote back then. On my PC it's now 6.8 times faster than strtod and 22.6 times faster than sstream.

double parseFloat(const std::string& input)
{
    const char *p = input.c_str();
    if (!*p || *p == '?')
        return NAN_D;
    int s = 1;
    while (*p == ' ') p++;

    if (*p == '-') {
        s = -1; p++;
    }

    double acc = 0;
    while (*p >= '0' && *p <= '9')
        acc = acc * 10 + *p++ - '0';

    if (*p == '.') {
        double k = 0.1;
        p++;
        while (*p >= '0' && *p <= '9') {
            acc += (*p++ - '0') * k;
            k *= 0.1;
        }
    }
    if (*p) die("Invalid numeric format");
    return s * acc;
}
6502
  • 112,025
  • 15
  • 165
  • 265
7

string stream is slow. Quite very slow. If you are writing anything performance critical that acts on large data sets ( say loading assets after a level change during a game ) do not use string streams. I recommend using the old school c library parsing functions for performance, although I cannot say how they compare to something like boost spirit.

However, compared to c library functions, string streams are very elegant, readable and reliable so if what you are doing is not performance ciritcal I recommend sticking to streams.

5

In general, if you need speed, consider this library:

http://www.fastformat.org/

(I'm not sure if it contains functions for converting strings or streams to other types, though, so it may not answer your current example).

For the record, please note you're comparing apples to oranges here. strtod() is a simple function that has a single purpose (converting strings to double), while stringstream is a much more complex formatting mechanism, which is far from being optimized to that specific purpose. A fairer comparison would be comparing stringstream to the sprintf/sscanf line of functions, which would be slower than strtod() but still faster than stringstream. I'm not exactly sure what makes stringstream's design slower than sprintf/sscanf, but it seems like that's the case.

Boaz Yaniv
  • 6,334
  • 21
  • 30
  • 1
    why STL is slower than fastformat – hjbreg Apr 29 '11 at 10:33
  • 6
    @hjbreg, because it has to support locales. – Alex B Apr 29 '11 at 10:34
  • @hjbreg: There are several reasons for that. Part of it may be related to streams design considerations and unoptimized implementation. Another reason is STL's flexibility, which probably includes locale support and support for IO manipulators (I'm not sure if fastformat have these or something equivalent). – Boaz Yaniv Apr 29 '11 at 10:41
  • 1
    `strtod` is not particularly simple. It handles locales (for example thousand separator vs decimal point, and nitpicking the thousands separators), and rounds very very carefully. What makes iostreams slow is an insane level of virtual function dispatching, and a difficulty-of-correctness issue that leads implementers to forgo any optimization completely. – Potatoswatter Apr 29 '11 at 12:08
  • @hjbreg: fastformat has locale support (in addition to an awful lot of screaming propaganda) – sehe Apr 29 '11 at 12:18
2

Have you considered using lexical_cast from boost?

http://www.boost.org/doc/libs/1_46_1/libs/conversion/lexical_cast.htm

Edit: btw, the clear() should be redundant.

Šimon Tóth
  • 35,456
  • 20
  • 106
  • 151