I'm constructing CSV text files containing hundreds of millions of lines. Each call to the record
function forms a line of text and buffers it into a stringstream
. Periodically, depending on the input to the record
function, the buffered line(s) will either be written to file or discarded. I would guess that approximately 75% of the buffered lines end up being written to file most of the time.
So, what I'm really doing is forming a bunch of lines of text, deciding whether to throw them away or write them to a file, and then repeating over and over again many times.
Below is a simplified example of my code. Assume that CONDITION1
and CONDITION2
are just simple boolean expressions involving x
, y
, and z
; they don't take significant time to evaluate. The code is really slow, and I can see a couple of reasons: the use of stringstreams
in general, and the repeated calls to stringstream::str()
and stringstream::str(const string&)
in particular.
Question: how could I make this faster?
Note: I assume (or know) that using a std::string
to hold a bunch of text would be faster, but I'm concerned about the additional conversions that would be needed in order to construct the text using double
variables such as x
. (In the real case, there are about 10 different double variables that get concatenated delimited by commas.)
std::ofstream outf;
stringstream ss;
// open outf
void record(const double x, const bool y, const int z) {
ss << x << ", ";
if(y) ss << "YES, ";
else ss << "NO, ";
ss << z << "\n";
if(CONDITION1) {
if(CONDITION2)
outf << ss.str();
ss.str(std::string());
}
}