0

I have a binary file. I am reading a block of data from that file into an array of structs using fread method. My struct looks like below.

struct Num {
    uint64_t key;
    uint64_t val
};

My main goal is to write the array into a different text file with space separated key and value pairs in each line as shown below.

Key1 Val1
Key2 Val2
Key3 Val3

I have written a simple function to do this.

Num *buffer = new Num[buffer_size];
// Read a block of data from the binary file into the buffer array.
ofstream out_file(OUT_FILE, ios::out);
for(size_t i=0; i<buffer_size; i++)
    out_file << buffer[i].key << ' ' << buffer[i].val << '\n';

The code works. But it's slow. One more approach would be to create the entire string first and write to file only once at the end.

But I want to know if there are any best ways to do this. I found some info about ostream_iterator. But I am not sure how it works.

Pattu
  • 3,481
  • 8
  • 32
  • 41
  • 2
    ofstream is not the most performant thing.. you can use C functions which are faster, but do you realyl need the speed? I'm guessing it's not a production code.. – David Haim Nov 04 '15 at 18:14
  • It's for a school assignement. Actually I have a binary file of size 15GB or so and I want to convert it into a text file in the format shown above. – Pattu Nov 04 '15 at 18:17
  • 1
    the whole deal shouldn't take more than couple of minutes if it's written correctly. I wouldn't go overkilling it for this. `sprintf` is the farest I'd go with it – David Haim Nov 04 '15 at 18:21
  • *But it's slow* -- You never identified 1) What exactly is slow and 2) Whether you are running an optimized build or a non-optimized "debug" build. For 1) You are calling the allocator `new [ ]` and you're doing I/O. – PaulMcKenzie Nov 04 '15 at 18:22
  • I assume it's slow because the disk IO is slow. But I just want to know if there are any proper ways(clever C++ ways) to write an array of structs to a text file in the manner shown above. – Pattu Nov 04 '15 at 18:26
  • Can you just write as you read the data, rather than one and then the other? (Seconded on the mention of C functions by @DavidHaim, they're much faster than ofstream) – Andy M Nov 04 '15 at 18:35
  • @Pattu What is slow? Do you have actual value, to decide if it's sensible? And you need to show how you measure it. E.g. if it's kilobytes per second it's not fault of `ofstream`, if it's 100s MBps you probably hit the IO wall, so there is no point in doing it in a faster way. – luk32 Nov 04 '15 at 18:36
  • @DavidHaim Would you care to provide a source for the claim? Generally people say otherwise. E.g http://stackoverflow.com/questions/17468088/performance-difference-between-c-and-c-style-file-io , http://stackoverflow.com/questions/16351339/why-c-output-is-too-much-slower-than-c . – luk32 Nov 04 '15 at 18:38
  • @luk32 My own expirience, beacuse of that I didn't write it as an answer – David Haim Nov 04 '15 at 18:43
  • @DavidHaim I will give it a shot with the c methods. – Pattu Nov 04 '15 at 18:46
  • I recommend using *block* writes rather than writing one instance at a time. There is an overhead associated with a file transaction, whether you write 1 byte at a time or 1MB. The idea is to keep the hard drive spinning or write as much as you can per transaction. The call to `ofstream::write` or `fwrite` doesn't matter, the number of calls and the data per call is what matters. – Thomas Matthews Nov 04 '15 at 19:36

1 Answers1

0

The most efficient method to write structures to a file is to write as many as you can in the fewest transactions.

Usually this means using an array and writing entire array with one transaction.

The file is a stream device and is most efficient when data is continuously flowing in the stream. This can be as simple as writing the array in one call to more complicated using threads. You will save more time by performing block or burst I/O than worrying about which function call to use.

Also, in my own programs, I have observed that placing formatted text into a buffer (array) and then block writing the buffer is faster than using a function to write the formatted text to the file. There is a chance that the data stream may pause during the formatting. With writing formatted data from a buffer, the flow of data through the stream is continuous.

There are other factors involved in writing to a file, such as allocating space on the media, other tasks running on your system and any sharing of the file media.

By using the above techniques, I was able to write GBs of data in minutes instead of the previous duration of hours.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154