2

I'm constructing CSV text files containing hundreds of millions of lines. Each call to the record function forms a line of text and buffers it into a stringstream. Periodically, depending on the input to the record function, the buffered line(s) will either be written to file or discarded. I would guess that approximately 75% of the buffered lines end up being written to file most of the time.

So, what I'm really doing is forming a bunch of lines of text, deciding whether to throw them away or write them to a file, and then repeating over and over again many times.

Below is a simplified example of my code. Assume that CONDITION1 and CONDITION2 are just simple boolean expressions involving x, y, and z; they don't take significant time to evaluate. The code is really slow, and I can see a couple of reasons: the use of stringstreams in general, and the repeated calls to stringstream::str() and stringstream::str(const string&) in particular.

Question: how could I make this faster?

Note: I assume (or know) that using a std::string to hold a bunch of text would be faster, but I'm concerned about the additional conversions that would be needed in order to construct the text using double variables such as x. (In the real case, there are about 10 different double variables that get concatenated delimited by commas.)

std::ofstream outf;
stringstream ss;

// open outf

void record(const double x, const bool y, const int z) {
    ss << x << ", ";
    if(y) ss << "YES, ";
    else  ss << "NO, ";
    ss << z << "\n";

    if(CONDITION1) {
        if(CONDITION2)
            outf << ss.str();
        ss.str(std::string());
    }
}
synaptik
  • 8,971
  • 16
  • 71
  • 98
  • 2
    I feel like this would be I/O bound. If that would be the case, there is not really much you can do but buy a faster HD/SSD/whatever you write to. Did your measurement show that this is *not* I/O bound? – Baum mit Augen May 22 '15 at 00:11
  • 1
    @BaummitAugen top shows my CPU at 100% consistently, and nmon shows only intermittent disk writing activity. – synaptik May 22 '15 at 00:38
  • ...but that's just throwing sand in the air. I haven't profiled. – synaptik May 22 '15 at 00:40
  • 1
    @BaummitAugen: I haven't met an iostreams implementation yet that can keep a disk saturated.... not even the physical speed, nevermind bus or cache performance. Oh sure, `filebuf` can transfer large blocks pretty fast, but the formatted I/O layer on top is miserably slow. See http://stackoverflow.com/q/4340396/103167 – Ben Voigt May 22 '15 at 03:38

3 Answers3

2

Assuming it's possible, the first and most obvious optimization would be to check the conditions before doing any of the conversions. Along with that, you can avoid creating a string object from the content of the stringstream, and just copy directly from the stringstream's buffer to the ostream's buffer. I'd also probably use a vector to handle the Boolean conversion. Especially with an implementation that doesn't include the short string optimization, you may also be able to save some time by pre-initializing an empty string to use to clear the stringstream:

std::ofstream outf;
stringstream ss;

namespace {
    char const *names[] = { "No, ", "Yes, "};
    std::string clear;
}

// open outf

void record(const double x, const bool y, const int z) {

    if (!(CONDITION1 && CONDITION2))
        return;

    ss << x << ", ";
    ss << names[y];
    ss << z << "\n";

    outf << ss.rdbuf();
    ss.str(clear);
}

Assuming you actually can check the conditions early like this, you can probably eliminate the stringstream entirely as well.

void record(const double x, const bool y, const int z) {

    if (!(CONDITION1 && CONDITION2))
        return;

    outf << x << ", "
         << names[y]
         << z << "\n";
}

I honestly doubt this is going to make a huge difference, but those are the best guesses that occur to me immediately without seeing the code you actually care about.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
1

Background

There are two fundamental issues causing the slowdown:

  1. Obtaining the data
  2. Parsing the data

These are the items that are usually taking up the most time.

Obtaining the data

The optimal process for obtaining the data is to keep the data flowing into memory. This could mean anything from reading large blocks of data to using a thread to keep reading. If you are using hard drives, they don't like to stop. They have an overhead of starting up and locating sectors. The starting up can be reduced by reading large blocks (more data per request).

Parsing the data

The time is lost in searching for separation characters and converting textual representation into internal representation.

Fixed field lengths are the fastest to parse. No need for searching, the data is at fixed character positions. This eliminates the time for searching for the separation character.

Also, fixed field is easier to process from a buffer. With variable length records, the record may span past the end of a buffer, causing some extra code to be executed.

Reduce the branching

Processors prefer to continuously execute instructions. They get a little upset when they hit a branch instruction. This means they have to fetch instructions from elsewhere. Conditional are worse. The fetching machinery can't fetch until the condition is resolved (although there is more research done on fetching algorithms to speed them up). In summary, reduce the number of branches in your bottleneck area.

Profile. Then try some of these techniques. Profile again. Compare the "before" and "after" profiles to determine the gain.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • "reduce the number of branches in your bottleneck"... excellent advice. Note that this immediately and unavoidably means discarding iostreams, which are jam-packed with virtual function calls. – Ben Voigt May 22 '15 at 03:44
1

Ditch the iostreams. Not even counting the file I/O itself, simply building your stringstream is already reducing your performance by over an order of magnitude compared to what your disk should be capable of. (See here for evidence).

Use Jerry's early-exit trick, and then consider building your buffer using plain C string processing functions like strncat (or the "safe" version that takes care of buffer overflow prevention so you don't have to) or snprintf.

C++ iostreams are almost competent for using a filebuf to transfer a prepared buffer to disk, but you might still want to benchmark against FILE* and/or OS file access APIs.

Also, don't be afraid to buffer a few dozen records into your application-level buffer before handing them off to the disk.

Community
  • 1
  • 1
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • You mentioned "a few dozen records". Is there any reason I shouldn't make the char array as large as I can (within reason) give my available memory? (I say char array on the assumption of switching from stringstream to char array.) – synaptik May 22 '15 at 12:20
  • Also, for the double (or int) to string (or char array) conversion, should I just use dtoa? – synaptik May 22 '15 at 12:49