9

I was running some benchmarks to find the most efficient way to write a huge array to a file in C++ (more than 1Go in ASCII).

So I compared std::ofstream with fprintf (see the switch I used below)

    case 0: {
        std::ofstream out(title, std::ios::out | std::ios::trunc);
        if (out) {
            ok = true;
            for (i=0; i<M; i++) {
                for (j=0; j<N; j++) {
                    out<<A[i][j]<<" ";
                }
                out<<"\n";
            }
            out.close();
        } else {
            std::cout<<"Error with file : "<<title<<"\n";
        }
        break;
    }
    case 1: {
        FILE *out = fopen(title.c_str(), "w");
        if (out!=NULL) {
            ok = true;
            for (i=0; i<M; i++) {
                for (j=0; j<N; j++) {
                    fprintf(out, "%d ", A[i][j]);
                }
                fprintf(out, "\n");
            }
            fclose(out);
        } else {
            std::cout<<"Error with file : "<<title<<"\n";
        }
        break;
    }

And my huge problem is that fprintf seems to be more thant 12x slower compared to std::ofstream. Do you have an idea of what is the origin of the problem in my code ? Or maybe std::ofstream is very optimized compared to fprintf ?

(and an other question : do you know another faster way to write a file)

Thank you very much

(detail : I was compiling with g++ -Wall -O3)

Vincent
  • 57,703
  • 61
  • 205
  • 388
  • i think you should use fputs instead of fprintf to get more similar behavior – AndersK Oct 24 '11 at 14:55
  • 1
    also look at `ostream::write()`: http://www.cplusplus.com/reference/iostream/ostream/write/ – Nim Oct 24 '11 at 14:57
  • 1
    @AndersK.: No. fputs is the equivalent of a streambuf (unformatted); fprintf is the proper counterpart of ostream. – MSalters Oct 24 '11 at 14:57
  • @MSalters why do you think that? fprintf contains a variable argument list processing whereas the operator<< on ostream doesn't. A better comparison would be with fputs. – AndersK Oct 24 '11 at 14:59
  • I wonder what the size of the output buffers are in each case. – Robᵩ Oct 24 '11 at 15:04
  • Hi, Vincent. I have two suggestions on how to improve the quality of the answers you recieve. 1) Please go back to your previous questions and accept the answers you find most useful. 2) Please create a minimal, complete program that we can compile and run that demonstrates your question. If we run this program fragment, we have to make assumptions about what the rest of the program does. See http://sscce.org/. – Robᵩ Oct 24 '11 at 15:07
  • If you're writing data to a file, a more efficient method is to write to a big buffer first, then use the block write functions to move from buffer to file. This will reduce the number of I/O transactions and turn many little requests into one large requests (which hard drives prefer). – Thomas Matthews Oct 24 '11 at 15:20
  • 4
    FWIW: [This program](http://ideone.com/aZW3v), derived from the OP's program fragment, takes essentially identical time to run either branch of the switch statement. – Robᵩ Oct 24 '11 at 15:21

5 Answers5

18

fprintf("%d" requires runtime parsing of the format string, once per integer. ostream& operator<<(ostream&, int) is resolved by the compiler, once per compilation.

MSalters
  • 173,980
  • 10
  • 155
  • 350
4

Well, fprintf() does have to do a bit more work at runtime, since it has to parse and process the format string. However, given the size of your output file I would expect those differences to be of little consequence, and would expect the code to be I/O bound.

I therefore suspect that your benchmark is flawed in some way.

  1. Do you consistently get a 12x difference if you run the tests repeatedly?
  2. What happens to the timings if you reverse the order in which you run the tests?
  3. What happens if you call fsync()/sync() at the end?
NPE
  • 486,780
  • 108
  • 951
  • 1,012
2

There is a file buffer in the ofstream, this may decrease the times accessing to the disk. in addition, fprintf is a function with variable parameters which will call some va_# functions, but ofstream won't.I think you can use fwrite() or putc() to have a test.

YangG
  • 21
  • 1
1

Have you set sync_with_stdio somewhere upstream of the code you have shown?

While what you report is opposite that of what is empirically seen, most people think and believe what you see should be the norm. iostreams are type-safe, whereas the printf family of functions are variadic functions that have to infer the types of the va_list from the format specifier.

Happy Green Kid Naps
  • 1,611
  • 11
  • 18
1

I present here a really optimized way to write integers on a text files using unix functions open, read and write. They are also available on windows, just give you some warning you can work with. This implementation works only for 32 bits integer.

In your include file:

class FastIntegerWriter
{
private:

    const int bufferSize;
    int offset;
    int file;
    char* buffer;

public:

    FastIntegerWriter(int bufferSize = 4096);
    int Open(const char *filename);
    void Close();
    virtual ~FastIntegerWriter();
    void Flush();
    void Writeline(int value);
};

In your source file

#ifdef _MSC_VER
# include <io.h>
# define open _open
# define write _write
# define read _read
# define close _close
#else
# include <unistd.h>
#endif
#include <fcntl.h>

FastIntegerWriter::FastIntegerWriter(int bufferSize) :
    bufferSize(bufferSize),
    buffer(new char[bufferSize]),
    offset(0),
    file(0)
{
}

int FastIntegerWriter::Open(const char* filename)
{
    this->Close();
    if (filename != NULL)
        this->file = open(filename, O_WRONLY | O_CREAT | O_TRUNC);
    return this->file;
}

void FastIntegerWriter::Close()
{
    this->Flush();
    if (this->file > 0)
    {
        close(this->file);
        this->file = 0;
    }
}

FastIntegerWriter::~FastIntegerWriter()
{
    this->Close();
    delete[] this->buffer;
}

void FastIntegerWriter::Flush()
{
    if (this->offset != 0)
    {
        write(this->file, this->buffer, this->offset);
        this->offset = 0;
    }
}

void FastIntegerWriter::Writeline(int value)
{
    if (this->offset >= this->bufferSize - 12)
    {
        this->Flush();
    }

    // Compute number of required digits

    char* output = this->buffer + this->offset;

    if (value < 0)
    {
        if (value == -2147483648)
        {
            // Special case, the minimum integer does not have a corresponding positive value.
            // We use an hard coded string and copy it directly to the buffer.
            // (Thanks to Eugene Ryabtsev for the suggestion).

            static const char s[] = "-2147483648\n";
            for (int i = 0; i < 12; ++i)
                output[i] = s[i];
            this->offset += 12;
            return;
        }

        *output = '-';
        ++output;
        ++this->offset;
        value = -value;
    }

    // Compute number of digits (log base 10(value) + 1)

    int digits =
        (value >= 1000000000) ? 10 : (value >= 100000000) ? 9 : (value >= 10000000) ? 8 : 
        (value >= 1000000) ? 7 : (value >= 100000) ? 6 : (value >= 10000) ? 5 : 
        (value >= 1000) ? 4 : (value >= 100) ? 3 : (value >= 10) ? 2 : 1;

    // Convert number to string

    output[digits] = '\n';
    for (int i = digits - 1; i >= 0; --i)
    {
        output[i] = value % 10 + '0';
        value /= 10;
    }

    this->offset += digits + 1;
}

I guess this will outperform every other method to write to an ascii file :) you may get some more performance using windows low level apis WriteFile and ReadFile, but it don't worth the effort.

To use it...

int main()
{
    FastIntegerWriter fw;
    fw.Open("test.txt");

    for (int i = -2000; i < 1000000; ++i)
        fw.Writeline(i);

    return 0;
}

If you don't specify any file it uses standard output (console).

Salvatore Previti
  • 8,956
  • 31
  • 37
  • 1
    Note that `value = -value` would work incorrectly if the most negative integer is passed as there is no corresponding positive value. See http://stackoverflow.com/a/5165813/1353187 – Eugene Ryabtsev Oct 10 '13 at 06:06
  • Correct. Didn't think about it writing that code. The best and simplest way to handle it is to hard code in a string the most negative integer and write an if (value == most_negative_integer) write_the_string – Salvatore Previti Oct 12 '13 at 00:22
  • Even faster implementations shown here: https://stackoverflow.com/q/4351371/103167 – Ben Voigt Sep 19 '19 at 20:51