Fast way to write data from a std::vector to a text file

Question

I currently write a set of doubles from a vector to a text file like this:

std::ofstream fout;
fout.open("vector.txt");

for (l = 0; l < vector.size(); l++)
    fout << std::setprecision(10) << vector.at(l) << std::endl;

fout.close();

But this is taking a lot of time to finish. Is there a faster or more efficient way to do this? I would love to see and learn it.

my vector contain doubles, mostly numbers less between 10 and -10 with a lot of decimals — Diego Fernando Pava, Sep 28 '16 at 17:26
I guess the most speed-up would be reached by using a faster double to string conversion, with hardcoded format and precision. Once I've written my own ftoa function (for certain float ranges) which was factor 20 faster. https://github.com/rudimeier/atem/blob/master/src/ftoa.c — rudimeier, Sep 28 '16 at 17:33
This should have been on [**CodeReview**](http://codereview.stackexchange.com/) instead? — Khalil Khalaf, Sep 28 '16 at 17:54
@FirstStep: CR is for reviewing code. The OP is asking how to do something more efficiently. That would be off-topic on CR. Also, if something already has answers, a migration is usually not suggested. — Trojan404, Sep 28 '16 at 17:57
@Trojan404 Is it? I did not see that on [What topics can I ask about there](http://codereview.stackexchange.com/help/on-topic). Thanks for letting me know. — Khalil Khalaf, Sep 28 '16 at 17:58
I don;t have time to write a proper answer, but I suspect a lot of the overhead is from doing a lot of small (though buffered) writes. I expect you will get better performance if you create the file in advance, truncate (yes, it is counter-intuitive to truncate something larger) it to the size required, mmap the entire thing into your memory space then write the data into the memory buffer. The needed commands will vary per OS but I suspect this will give a major performance boost. You may also get something from unrolling the loop a little if you know your vector is a multiple of n doubles. — Vality, Sep 29 '16 at 19:53

score 73 · Answer 1 · edited May 23 '17 at 12:16

73

std::ofstream fout("vector.txt");
fout << std::setprecision(10);

for(auto const& x : vector)
    fout << x << '\n';

Everything I changed had theoretically worse performance in your version of the code, but the std::endl was the real killer. std::vector::at (with bounds checking, which you don't need) would be the second, then the fact that you did not use iterators.

Why default-construct a std::ofstream and then call open, when you can do it in one step? Why call close when RAII (the destructor) takes care of it for you? You can also call

fout << std::setprecision(10)

just once, before the loop.

As noted in the comment below, if your vector is of elements of fundamental type, you might get a better performance with for(auto x : vector). Measure the running time / inspect the assembly output.

Just to point out another thing that caught my eyes, this:

for(l = 0; l < vector.size(); l++)

What is this l? Why declare it outside the loop? It seems you don't need it in the outer scope, so don't. And also the post-increment.

The result:

for(size_t l = 0; l < vector.size(); ++l)

I'm sorry for making code review out of this post.

edited May 23 '17 at 12:16

Community

1
1

answered Sep 28 '16 at 17:14

LogicStuff

19,397
6
54
74

4

Also, given that it's a vector of doubles, using a non-ref iteration variable (`for (auto x : vector)`) may be faster -- it trades a dereference for a double-copy. – Michael Gunter Sep 28 '16 at 17:16
thank you so, much, as I told you I am just learning and I am self teaching myself c++ soi errors like this I understand are common. – Diego Fernando Pava Sep 28 '16 at 17:23
one question, even if I open a second stream to write to another text file I don't need to close? – Diego Fernando Pava Sep 28 '16 at 17:24
1

FYI: The item mentioned in my comment does yield better performance on my machine, but it's on the scale of about 1 tick per 10,000,000 items. If your vector contained more complex data (or data wider than 64-bit), then @LogicStuff's answer is better. – Michael Gunter Sep 28 '16 at 17:25
1

The stream manipulators change persistent state, right? So you could set the precision to 10 once before the loop instead of every time. Admittedly, this is probably a much smaller factor than the extra flush hidden in the `std::endl`. – Adrian McCarthy Sep 28 '16 at 17:28
1

The range for loop also avoids the `vector::at` calls, which are bounds checked and thus possibly slower than `vector::operator[]`, which is also avoided with the range for loop. – Adrian McCarthy Sep 28 '16 at 17:31
@AdrianMcCarthy Yeah, thanks. There's too much *not completely right* for me to catch and quickly write down. – LogicStuff Sep 28 '16 at 17:35
4

I thought post-increment wasn't a problem anymore, since C++11? – Robinson Sep 28 '16 at 17:35
1

Also, assign the result of `vector.size()` to a constant variable before the loop so that the function isn't called for each iteration. – Thomas Matthews Sep 28 '16 at 17:37
1

Again, doesn't the compiler optimise this away? – Robinson Sep 28 '16 at 17:41
1

[Concerning the `vector.size()`](http://stackoverflow.com/a/3901666/3552770). To the pre/post-increment, it doesn't hurt to get used to the former, for genericity. – LogicStuff Sep 28 '16 at 17:49
@DiegoFernandoPava You'd need to call `close` and `open` again. Or use move-assignment. Or use two different `std::ofstream` objects, one of which gets destructed before the creation of the other. – LogicStuff Sep 28 '16 at 17:55
1

The only way to answer questions like "doesn't the compiler optimise this away" is by **actually looking at the output generated by your compiler**. Either disassemble the binary, or ask your compiler to generate an assembly listing. There are no hard and fast rules. Yes, *usually* a sufficiently smart optimizer will perform loop-invariant hoisting, but there are no guarantees, certainly not in cases where the code is complicated enough that you are uncertain. And never *guess* when trying to optimize. Assembly can be difficult to read, but you can usually grok enough to tell. @robinson – Cody Gray - on strike Sep 28 '16 at 20:12
Jason Turner has a great C++ Weekly episode about this topic. https://www.youtube.com/watch?v=GMqQOEZYVJQ – sudo make install Oct 05 '16 at 07:37
@Robinson post increment isn't wanted or needed here. Just as you can say less variables when you mean fewer variables and people will understand you, you shouldn't. Even though the compiler will save you why not write code that reads correctly? – GBrookman Oct 07 '16 at 11:57

score 37 · Accepted Answer · edited May 23 '17 at 12:16

Your algorithm has two parts:

Serialize double numbers to a string or character buffer.
Write results to a file.

The first item can be improved (> 20%) by using sprintf or fmt. The second item can be sped up by caching results to a buffer or extending the output file stream buffer size before writing results to the output file. You should not use std::endl because it is much slower than using "\n". If you still want to make it faster then write your data in binary format. Below is my complete code sample which includes my proposed solutions and one from Edgar Rokyan. I also included Ben Voigt and Matthieu M suggestions in test code.

#include <algorithm>
#include <cstdlib>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <vector>

// https://github.com/fmtlib/fmt
#include "fmt/format.h"

// http://uscilab.github.io/cereal/
#include "cereal/archives/binary.hpp"
#include "cereal/archives/json.hpp"
#include "cereal/archives/portable_binary.hpp"
#include "cereal/archives/xml.hpp"
#include "cereal/types/string.hpp"
#include "cereal/types/vector.hpp"

// https://github.com/DigitalInBlue/Celero
#include "celero/Celero.h"

template <typename T> const char* getFormattedString();
template<> const char* getFormattedString<double>(){return "%g\n";}
template<> const char* getFormattedString<float>(){return "%g\n";}
template<> const char* getFormattedString<int>(){return "%d\n";}
template<> const char* getFormattedString<size_t>(){return "%lu\n";}


namespace {
    constexpr size_t LEN = 32;

    template <typename T> std::vector<T> create_test_data(const size_t N) {
        std::vector<T> data(N);
        for (size_t idx = 0; idx < N; ++idx) {
            data[idx] = idx;
        }
        return data;
    }

    template <typename Iterator> auto toVectorOfChar(Iterator begin, Iterator end) {
        char aLine[LEN];
        std::vector<char> buffer;
        buffer.reserve(std::distance(begin, end) * LEN);
        const char* fmtStr = getFormattedString<typename std::iterator_traits<Iterator>::value_type>();
        std::for_each(begin, end, [&buffer, &aLine, &fmtStr](const auto value) {
            sprintf(aLine, fmtStr, value);
            for (size_t idx = 0; aLine[idx] != 0; ++idx) {
                buffer.push_back(aLine[idx]);
            }
        });
        return buffer;
    }

    template <typename Iterator>
    auto toStringStream(Iterator begin, Iterator end, std::stringstream &buffer) {
        char aLine[LEN];
        const char* fmtStr = getFormattedString<typename std::iterator_traits<Iterator>::value_type>();
        std::for_each(begin, end, [&buffer, &aLine, &fmtStr](const auto value) {            
            sprintf(aLine, fmtStr, value);
            buffer << aLine;
        });
    }

    template <typename Iterator> auto toMemoryWriter(Iterator begin, Iterator end) {
        fmt::MemoryWriter writer;
        std::for_each(begin, end, [&writer](const auto value) { writer << value << "\n"; });
        return writer;
    }

    // A modified version of the original approach.
    template <typename Container>
    void original_approach(const Container &data, const std::string &fileName) {
        std::ofstream fout(fileName);
        for (size_t l = 0; l < data.size(); l++) {
            fout << data[l] << std::endl;
        }
        fout.close();
    }

    // Replace std::endl by "\n"
    template <typename Iterator>
    void improved_original_approach(Iterator begin, Iterator end, const std::string &fileName) {
        std::ofstream fout(fileName);
        const size_t len = std::distance(begin, end) * LEN;
        std::vector<char> buffer(len);
        fout.rdbuf()->pubsetbuf(buffer.data(), len);
        for (Iterator it = begin; it != end; ++it) {
            fout << *it << "\n";
        }
        fout.close();
    }

    //
    template <typename Iterator>
    void edgar_rokyan_solution(Iterator begin, Iterator end, const std::string &fileName) {
        std::ofstream fout(fileName);
        std::copy(begin, end, std::ostream_iterator<double>(fout, "\n"));
    }

    // Cache to a string stream before writing to the output file
    template <typename Iterator>
    void stringstream_approach(Iterator begin, Iterator end, const std::string &fileName) {
        std::stringstream buffer;
        for (Iterator it = begin; it != end; ++it) {
            buffer << *it << "\n";
        }

        // Now write to the output file.
        std::ofstream fout(fileName);
        fout << buffer.str();
        fout.close();
    }

    // Use sprintf
    template <typename Iterator>
    void sprintf_approach(Iterator begin, Iterator end, const std::string &fileName) {
        std::stringstream buffer;
        toStringStream(begin, end, buffer);
        std::ofstream fout(fileName);
        fout << buffer.str();
        fout.close();
    }

    // Use fmt::MemoryWriter (https://github.com/fmtlib/fmt)
    template <typename Iterator>
    void fmt_approach(Iterator begin, Iterator end, const std::string &fileName) {
        auto writer = toMemoryWriter(begin, end);
        std::ofstream fout(fileName);
        fout << writer.str();
        fout.close();
    }

    // Use std::vector<char>
    template <typename Iterator>
    void vector_of_char_approach(Iterator begin, Iterator end, const std::string &fileName) {
        std::vector<char> buffer = toVectorOfChar(begin, end);
        std::ofstream fout(fileName);
        fout << buffer.data();
        fout.close();
    }

    // Use cereal (http://uscilab.github.io/cereal/).
    template <typename Container, typename OArchive = cereal::BinaryOutputArchive>
    void use_cereal(Container &&data, const std::string &fileName) {
        std::stringstream buffer;
        {
            OArchive oar(buffer);
            oar(data);
        }

        std::ofstream fout(fileName);
        fout << buffer.str();
        fout.close();
    }
}

// Performance test input data.
constexpr int NumberOfSamples = 5;
constexpr int NumberOfIterations = 2;
constexpr int N = 3000000;
const auto double_data = create_test_data<double>(N);
const auto float_data = create_test_data<float>(N);
const auto int_data = create_test_data<int>(N);
const auto size_t_data = create_test_data<size_t>(N);

CELERO_MAIN

BASELINE(DoubleVector, original_approach, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("origsol.txt");
    original_approach(double_data, fileName);
}

BENCHMARK(DoubleVector, improved_original_approach, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("improvedsol.txt");
    improved_original_approach(double_data.cbegin(), double_data.cend(), fileName);
}

BENCHMARK(DoubleVector, edgar_rokyan_solution, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("edgar_rokyan_solution.txt");
    edgar_rokyan_solution(double_data.cbegin(), double_data.end(), fileName);
}

BENCHMARK(DoubleVector, stringstream_approach, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("stringstream.txt");
    stringstream_approach(double_data.cbegin(), double_data.cend(), fileName);
}

BENCHMARK(DoubleVector, sprintf_approach, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("sprintf.txt");
    sprintf_approach(double_data.cbegin(), double_data.cend(), fileName);
}

BENCHMARK(DoubleVector, fmt_approach, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("fmt.txt");
    fmt_approach(double_data.cbegin(), double_data.cend(), fileName);
}

BENCHMARK(DoubleVector, vector_of_char_approach, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("vector_of_char.txt");
    vector_of_char_approach(double_data.cbegin(), double_data.cend(), fileName);
}

BENCHMARK(DoubleVector, use_cereal, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("cereal.bin");
    use_cereal(double_data, fileName);
}

// Benchmark double vector
BASELINE(DoubleVectorConversion, toStringStream, NumberOfSamples, NumberOfIterations) {
    std::stringstream output;
    toStringStream(double_data.cbegin(), double_data.cend(), output);
}

BENCHMARK(DoubleVectorConversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toMemoryWriter(double_data.cbegin(), double_data.cend()));
}

BENCHMARK(DoubleVectorConversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toVectorOfChar(double_data.cbegin(), double_data.cend()));
}

// Benchmark float vector
BASELINE(FloatVectorConversion, toStringStream, NumberOfSamples, NumberOfIterations) {
    std::stringstream output;
    toStringStream(float_data.cbegin(), float_data.cend(), output);
}

BENCHMARK(FloatVectorConversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toMemoryWriter(float_data.cbegin(), float_data.cend()));
}

BENCHMARK(FloatVectorConversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toVectorOfChar(float_data.cbegin(), float_data.cend()));
}

// Benchmark int vector
BASELINE(int_conversion, toStringStream, NumberOfSamples, NumberOfIterations) {
    std::stringstream output;
    toStringStream(int_data.cbegin(), int_data.cend(), output);
}

BENCHMARK(int_conversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toMemoryWriter(int_data.cbegin(), int_data.cend()));
}

BENCHMARK(int_conversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toVectorOfChar(int_data.cbegin(), int_data.cend()));
}

// Benchmark size_t vector
BASELINE(size_t_conversion, toStringStream, NumberOfSamples, NumberOfIterations) {
    std::stringstream output;
    toStringStream(size_t_data.cbegin(), size_t_data.cend(), output);
}

BENCHMARK(size_t_conversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toMemoryWriter(size_t_data.cbegin(), size_t_data.cend()));
}

BENCHMARK(size_t_conversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toVectorOfChar(size_t_data.cbegin(), size_t_data.cend()));
}

Below are the performance results obtained in my Linux box using clang-3.9.1 and -O3 flag. I use Celero to collect all performance results.

Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  | 
-----------------------------------------------------------------------------------------------------------------------------------------------
DoubleVector    | original_approa | Null            |              10 |               4 |         1.00000 |   3650309.00000 |            0.27 | 
DoubleVector    | improved_origin | Null            |              10 |               4 |         0.47828 |   1745855.00000 |            0.57 | 
DoubleVector    | edgar_rokyan_so | Null            |              10 |               4 |         0.45804 |   1672005.00000 |            0.60 | 
DoubleVector    | stringstream_ap | Null            |              10 |               4 |         0.41514 |   1515377.00000 |            0.66 | 
DoubleVector    | sprintf_approac | Null            |              10 |               4 |         0.35436 |   1293521.50000 |            0.77 | 
DoubleVector    | fmt_approach    | Null            |              10 |               4 |         0.34916 |   1274552.75000 |            0.78 | 
DoubleVector    | vector_of_char_ | Null            |              10 |               4 |         0.34366 |   1254462.00000 |            0.80 | 
DoubleVector    | use_cereal      | Null            |              10 |               4 |         0.04172 |    152291.25000 |            6.57 | 
Complete.

I also benchmark for numeric to string conversion algorithms to compare the performance of std::stringstream, fmt::MemoryWriter, and std::vector.

Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  | 
-----------------------------------------------------------------------------------------------------------------------------------------------
DoubleVectorCon | toStringStream  | Null            |              10 |               4 |         1.00000 |   1272667.00000 |            0.79 | 
FloatVectorConv | toStringStream  | Null            |              10 |               4 |         1.00000 |   1272573.75000 |            0.79 | 
int_conversion  | toStringStream  | Null            |              10 |               4 |         1.00000 |    248709.00000 |            4.02 | 
size_t_conversi | toStringStream  | Null            |              10 |               4 |         1.00000 |    252063.00000 |            3.97 | 
DoubleVectorCon | toMemoryWriter  | Null            |              10 |               4 |         0.98468 |   1253165.50000 |            0.80 | 
DoubleVectorCon | toVectorOfChar  | Null            |              10 |               4 |         0.97146 |   1236340.50000 |            0.81 | 
FloatVectorConv | toMemoryWriter  | Null            |              10 |               4 |         0.98419 |   1252454.25000 |            0.80 | 
FloatVectorConv | toVectorOfChar  | Null            |              10 |               4 |         0.97369 |   1239093.25000 |            0.81 | 
int_conversion  | toMemoryWriter  | Null            |              10 |               4 |         0.11741 |     29200.50000 |           34.25 | 
int_conversion  | toVectorOfChar  | Null            |              10 |               4 |         0.87105 |    216637.00000 |            4.62 | 
size_t_conversi | toMemoryWriter  | Null            |              10 |               4 |         0.13746 |     34649.50000 |           28.86 | 
size_t_conversi | toVectorOfChar  | Null            |              10 |               4 |         0.85345 |    215123.00000 |            4.65 | 
Complete.

From the above tables we can see that:

Edgar Rokyan solution is 10% slower than the stringstream solution. The solution that use fmt library is the best for three studied data types which are double, int, and size_t. sprintf + std::vector solution is 1% faster than the fmt solution for double data type. However, I do not recommend solutions that use sprintf for production code because they are not elegant (still written in C style) and do not work out of the box for different data types such as int or size_t.
The benchmark results also show that fmt is the superrior integral data type serialization since it is at least 7x faster than other approaches.
We can speed up this algorithm 10x if we use the binary format. This approach is significantly faster than writing to a formatted text file because we only do raw copy from the memory to the output. If you want to have more flexible and portable solutions then try cereal or boost::serialization or protocol-buffer. According to this performance study cereal seem to be the fastest.

In version 3 you'll get significantly better performance if you use a `vector` to catenate all the individual strings instead of `std::stringstream`. Benchmark question: http://stackoverflow.com/q/4340396/103167 — Ben Voigt, Sep 28 '16 at 20:33
Thanks a lot for the link. You sample code is no longer exist so I have to guess what you have done. I also added one more solution that use fmt(https://github.com/fmtlib/fmt). From the my benchmark results std::vector is the fastest solution, however, this solution is not significantly faster than solutions that use stringstream and fmt::MemoryWriter. — hungptit, Sep 28 '16 at 22:03
@hungptit: Caching to a stream is unlikely to have much of an impact in general, because `std::ofstream` *already* caches before writing to the file. Unless, of course, you use `std::endl` after each element, which calls `flush`, which *mandates* a syscall. — Matthieu M., Sep 29 '16 at 12:15
@MatthieuM. I have updated the peformance study and the performance results show that we can improve 10% by caching output to the std::stringstream. — hungptit, Sep 29 '16 at 21:39
@BenVoigt I have updated my solution and using std::vector is not significantly better than using std::stringstream. — hungptit, Sep 29 '16 at 21:41
Try `vector::insert` with an entire string at a time (using the character count returned from `sprintf`, no need to call `strlen()`), instead of looped calls to `push_back()`. — Ben Voigt, Sep 29 '16 at 22:06
@hungptit: Have you tried updating the `std::ofstream` buffer to a bigger size instead? See http://stackoverflow.com/questions/5166263/how-to-get-iostream-to-perform-better for the how to. Oh, and please do open that `ofstream` in binary mode, this way you'll avoid locale conversions of newline characters. — Matthieu M., Sep 30 '16 at 06:56
@BenVoigt Thanks a lot for your suggestion. I have updated a solution that uses std::vector. Your trick does improve the performance, however, the std::vector solution is only 3% faster than the stringstream solution. My example code is self contained you can always try it on your machine and please let me know if you have any suggestion. — hungptit, Sep 30 '16 at 17:22
@MatthieuM. I do not see any improvement by changing the size of the output file buffer. Please let me know if you have any suggestion. — hungptit, Sep 30 '16 at 17:24

score 24 · Answer 3 · edited Dec 17 '16 at 21:27

24

You can also use a rather neat form of outputting contents of any vector into the file, with a help of iterators and copy function.

std::ofstream fout("vector.txt");
fout.precision(10);

std::copy(numbers.begin(), numbers.end(),
    std::ostream_iterator<double>(fout, "\n"));

This solutions is practically the same with LogicStuff's solution in terms of execution time. But it also illustrates how to print the contents just with a single copy function which, as I suppose, looks pretty well.

edited Dec 17 '16 at 21:27

gsamaras

71,951
46
188
305

answered Sep 28 '16 at 17:40

Edgar Rokjān

17,245
4
40
67

4

You might want to use an ostreambuf_iterator instead. This gets rid of creating the steam sentry object for every item. – Daniel Jour Sep 28 '16 at 21:56
I dont know about the performance but this answer is the most elegant looking one I have seen. And it taught me about about `std::ostream_iterator`. +1 – Vality Sep 30 '16 at 05:01
1

Finally someone expressing intent in code :). For the sake of genericity: use free functions for your begin and end iterators: `copy(begin(numbers), end(numbers), ostream_iterator...)`. – xtofl Oct 05 '16 at 06:47
@xtofl I just put here a solution which is valid for C++ 98/03. Of course, we can rewrite this with `begin/end` functions. But my main intention was to show how magnificent STL is even without some C++ 11 stuff :) – Edgar Rokjān Oct 05 '16 at 07:13
1

@DanielJour: are you sure? Can an `ostreambuf_iterator` stream something else than `char`? (http://cpp.sh/7aea) – xtofl Oct 05 '16 at 07:25
@xtofl Agree with you. – Edgar Rokjān Oct 05 '16 at 07:29

score 14 · Answer 4 · answered Sep 29 '16 at 03:41

OK, I'm sad that there are three solutions that attempt to give you a fish, but no solution that attempts to teach you how to fish.

When you have a performance problem, the solution is to use a profiler, and fix whatever the problem the profiler shows.

Converting double-to-string for 300,000 doubles will not take 3 minutes on any computer that has shipped in the last 10 years.

Writing 3 MB of data to disk (an average size of 300,000 doubles) will not take 3 minutes on any computer that has shipped in the last 10 years.

If you profile this, my guess is that you'll find that fout gets flushed 300,000 times, and that flushing is slow, because it may involve blocking, or semi-blocking, I/O. Thus, you need to avoid the blocking I/O. The typical way of doing that is to prepare all your I/O to a single buffer (create a stringstream, write to that) and then write that buffer to a physical file in one go. This is the solution hungptit describes, except I think that what's missing is explaining WHY that solution is a good solution.

Or, to put it another way: What the profiler will tell you is that calling write() (on Linux) or WriteFile() (on Windows) is much slower than just copying a few bytes into a memory buffer, because it's a user/kernel level transition. If std::endl causes this to happen for each double, you're going to have a bad (slow) time. Replace it with something that just stays in user space and puts data in RAM!

If that's still not fast enough, it may be that the specific-precision version of operator<<() on strings is slow or involves unnecessary overhead. If so, you may be able to further speed up the code by using sprintf() or some other potentially faster function to generate data into the in-memory buffer, before you finally write the entire buffer to a file in one go.

Wouldn't it be simplest to just adjust buffering (fully buffered, reasonable buffer size, maybe a megabyte for PC software)? Not very familiar std:: streams, maybe they don't support that. — hyde, Sep 29 '16 at 15:15
@hyde: the buffering is there. The `cout << endl` actually flushes the buffers. — xtofl, Oct 05 '16 at 07:28
@xtofl That's line buffered, and that's what is killing the performance, because it flushes every few dozen bytes or whatever. Fully buffered would mean, it flushes when buffer is full, so with just 1 KB buffer it would flush 10 times less often, and with 1 MB buffer 10000 times less often. — hyde, Oct 05 '16 at 08:13
`cout << endl` actually _means_ 'flush, now'! So in this `cout << value << endl`, there is a flush after every double. Which is what @Jon Watte is saying. http://en.cppreference.com/w/cpp/io/manip/endl. So yes: 'reasonal buffer size' is there, but it's explicitly turned off by using `endl`. — xtofl, Oct 05 '16 at 08:21

score 5 · Answer 5 · edited Sep 28 '16 at 17:40

5

You have two main bottlenecks in your program: output and formatting text.

To increase performance, you will want to increase the amount of data output per call. For example, 1 output transfer of 500 characters is faster than 500 transfers of 1 character.

My recommendation is you format the data to a big buffer, then block write the buffer.

Here's an example:

char buffer[1024 * 1024];
unsigned int buffer_index = 0;
const unsigned int size = my_vector.size();
for (unsigned int i = 0; i < size; ++i)
{
  signed int characters_formatted = snprintf(&buffer[buffer_index],
                                             (1024 * 1024) - buffer_index,
                                             "%.10f", my_vector[i]);
  if (characters_formatted > 0)
  {
      buffer_index += (unsigned int) characters_formatted;
  }
}
cout.write(&buffer[0], buffer_index);

You should first try changing optimization settings in your compiler before messing with the code.

edited Sep 28 '16 at 17:40

Khalil Khalaf

9,259
11
62
104

answered Sep 28 '16 at 17:36

Thomas Matthews

56,849
17
98
154

3

The io stream buffers under the hood, so this amount of effort shouldn't be necessary. The OP code, unfortunately, used `std::endl` which flushes the buffer unnecessarily, defeating most of that benefit. – Adrian McCarthy Sep 28 '16 at 17:49
Yes, the IO stream buffers are under the hood. However, this technique allows for custom buffers that can be larger than the io stream buffers. – Thomas Matthews Sep 28 '16 at 18:15
This. Under certain conditions, std streams can be just as fast as the C printf functions. Yet under the wrong conditions they are orders of magnitude slower, so if performance is desired, use the C printf functions. – Peter Sep 28 '16 at 18:46
2

You'd have to measure and test to ensure the buffer size you choose is better suited than the one chosen by the iostreams implementation. I'd also be concerned that the parsing of the snprintf format string may cost substantially more than the (likely inlined) stream insertion operator. – Adrian McCarthy Sep 28 '16 at 18:58

Jason L. · Answer 6 · 2016-09-28T18:46:54.797

2

Here is a slightly different solution: save your doubles in binary form.

int fd = ::open("/path/to/the/file", O_WRONLY /* whatever permission */);
::write(fd, &vector[0], vector.size() * sizeof(vector[0]));

Since you mentioned that you have 300k doubles, which equals to 300k * 8 bytes = 2.4M, you can save all of them to local disk file in less than 0.1 second. The only drawback of this method is saved file is not as readable as string representation, but a HexEditor can solve that problem.

If you prefer more robust way, there are plenty of serialization libraries/tools available on line. They provide more benefits, such as language-neutral, machine-independent, flexible compression algorithm, etc. Those are the two I usually use:

edited Sep 28 '16 at 18:46

answered Sep 28 '16 at 18:13

Jason L.

741
6
11

It's possible to do this with C++ APIs if you don't want to limit your solution to Posix. – Adrian McCarthy Sep 28 '16 at 20:31
@AdrianMcCarthy: C++ only provides APIs for character I/O, `std::ios::binary` only disables newline conversions, it doesn't disable the character encoding step. Perhaps the character encoding step is a no-op on some popular C++ libraries... but there's no portable way to make sure of that. – Ben Voigt Sep 28 '16 at 20:35
1

@Ben Voigt: But any character encoding step would presumably be reversible, so simply writing the bytes of the vector's contents to a binary stream and reading them back should work. (Where in the standard does one read about such a character encoding step? I can't find anything that suggests a `basic_ostream::write` to a binary stream would do any sort of encoding.) – Adrian McCarthy Sep 28 '16 at 21:23
1

@AdrianMcCarthy: It's in 27.9.2 [filebuf]. In particular, the description of the `overflow` virtual override (which is the function that actually writes from the buffer to a file) says "the behavior of consuming characters is performed by ﬁrst converting as if by `a_codecvt.out(state, b, p, end, xbuf, xbuf+XSIZE, xbuf_end)`" – Ben Voigt Sep 29 '16 at 22:22

Fast way to write data from a std::vector to a text file

6 Answers6