8

In my application I'm trying to merge sorted files (keeping them sorted of course), so I have to iterate through each element in both files to write the minimal to the third one. This works pretty much slow on big files, as far as I don't see any other choice (the iteration has to be done) I'm trying to optimize file loading. I can use some amount of RAM, which I can use for buffering. I mean instead of reading 4 bytes from both files every time I can read once something like 100Mb and work with that buffer after that, until there will be no element in buffer, then I'll refill the buffer again. But I guess ifstream is already doing that, will it give me more performance and is there any reason? If fstream does, maybe I can change size of that buffer?

added

My current code looks like that (pseudocode)

// this is done in loop
int i1 = input1.read_integer();
int i2 = input2.read_integer();
if (!input1.eof() && !input2.eof())
{
   if (i1 < i2)
   {
      output.write(i1);
      input2.seek_back(sizeof(int));
   } else
      input1.seek_back(sizeof(int));
      output.write(i2);
   }
} else {
   if (input1.eof())
      output.write(i2);
   else if (input2.eof())
      output.write(i1);
}

What I don't like here is

  • seek_back - I have to seek back to previous position as there is no way to peek 4 bytes
  • too much reading from file
  • if one of the streams is in EOF it still continues to check that stream instead of putting contents of another stream directly to output, but this is not a big issue, because chunk sizes are almost always equal.

Can you suggest improvement for that?

Thanks.

ledokol
  • 81
  • 1
  • 4
  • How platform specific do you want to get? Various platforms let you provide hints to the OS Kernel to get it to do the job faster -- but of course those are going to be platform specific. – Billy ONeal Dec 29 '10 at 21:39
  • I wouldn't like to put any platform specific code there. Is it reasonable to implement buffering? – ledokol Dec 29 '10 at 21:43
  • @ledokol: Then my suggestion is to try caching yourself and see if it makes any difference. Without going to the kernel though of course it's not going to be as fast as possible. (For example, on Windows one would pass `FILE_FLAG_SEQUENTIAL_SCAN` for a workload like this) – Billy ONeal Dec 29 '10 at 21:44
  • @Billy: On Windows, one would also use `MapViewOfFile`. On *nix, `fadvise` and `mmap`. – Ben Voigt Dec 29 '10 at 22:26
  • There's a bug in your code. (What if both streams reach EOF at the same time?) – Amnon Dec 30 '10 at 08:29
  • Then nothing will be written to output and it will exit the loop. – ledokol Dec 30 '10 at 08:29
  • Added buffering and removed seek from algorithm, currently works much better. Thanks to everyone. – ledokol Dec 30 '10 at 16:19
  • @Amnon: They can both reach EOF at the same time. As he does a seek_back on one stream each time. – Martin York Dec 30 '10 at 20:57

6 Answers6

5

Without getting into the discussion on stream buffers, you can get rid of the seek_back and generally make the code much simpler by doing:

using namespace std;
merge(istream_iterator<int>(file1), istream_iterator<int>(),
           istream_iterator<int>(file2), istream_iterator<int>(),
           ostream_iterator<int>(cout));

Edit:

Added binary capability

#include <algorithm>
#include <iterator>
#include <fstream>
#include <iostream>

struct BinInt
{
    int value;
    operator int() const { return value; }
    friend std::istream& operator>>(std::istream& stream, BinInt& data)
    {
        return stream.read(reinterpret_cast<char*>(&data.value),sizeof(int));
    }
};

int main()
{
    std::ifstream   file1("f1.txt");
    std::ifstream   file2("f2.txt");

    std::merge(std::istream_iterator<BinInt>(file1), std::istream_iterator<BinInt>(),
               std::istream_iterator<BinInt>(file2), std::istream_iterator<BinInt>(),
               std::ostream_iterator<int>(std::cout));
}
Martin York
  • 257,169
  • 86
  • 333
  • 562
davka
  • 13,974
  • 11
  • 61
  • 86
3

In decreasing order of performance (best first):

  • memory-mapped I/O
  • OS-specific ReadFile or read calls.
  • fread into a large buffer
  • ifstream.read into a large buffer
  • ifstream and extractors
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • Do you really think memory-mapped I/O will give a gain in the case where you just read linearly through an entire file? This is not my understanding, but I am interested in learning. – Johan Kotlinski Dec 29 '10 at 22:43
  • @kotlinski: Well, it makes better use of cache, so when benchmarking by running the same test repeatedly, it sure does help. For truly a single linear pass through the file, I'm not sure there's any advantage over the OS-provided block reading functions. There's a really big difference, however, between the OS reading functions and the C++ library I/O functions. – Ben Voigt Dec 29 '10 at 22:48
  • @Ben: Do you have any experience with OS-specific reads versus stdio.h reads of optimal size? – Johan Kotlinski Dec 29 '10 at 22:51
  • @kotlinski: I haven't benchmarked them against each other, but my subjective assessment is that stdio.h functions are only a little slower than the OS-specific ones (stdio.h adds a caching layer on top of the OS functions, making it much better at handling chatty I/O but costing a little performance on large blocks). – Ben Voigt Dec 30 '10 at 03:41
  • only pure C++ can be used, no API's, I need a portable solution. – ledokol Dec 30 '10 at 05:30
  • stdio.h functions are a good combination of performance and portability. POSIX `open`/`read`/`close` is almost universally portable as well, [even on Windows](http://msdn.microsoft.com/en-us/library/40bbyw78.aspx). – Ben Voigt Dec 30 '10 at 05:34
  • stdio.h is C, not C++ and POSIX is not part of C++ standard library. Is there so huge difference between stdio and fstreams? – ledokol Dec 30 '10 at 07:42
  • `stdio.h` is part of the C++ standard library. You're right that it is also part of the C standard library. In your last comment you said you needed something "portable", and POSIX is very portable. Finally, yes, there is a big big BIG difference in performance between stdio API and `fstream`. – Ben Voigt Dec 30 '10 at 07:45
  • OK. I don't like to mix C and C++, however, if it will increase performance that is fine, I'll use stdio in that case when I'll get fixed the algorithm. – ledokol Dec 30 '10 at 08:32
2

A program like this should be I/O bound, meaning it should be spending at least 80% of it's time waiting for completion of reading or writing a buffer, and if the buffers are reasonably big, it should be keeping the disk heads busy. That's what you want.

Don't assume it is I/O bound, without proof. A way to prove it is by taking several stackshots. If it is, most of the samples will show the program waiting for I/O completion.

It is possible that it is not I/O bound, meaning you may find other things going on in some of the samples that you never expected. If so, then you know what to fix to speed it up. I have seen some code like this spending much more time than necessary in the merge loop, testing for end-of-file, getting data to compare, etc. for example.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
0

Unless there is something very special about your data it is unlikely that you will improve on the buffering that is built into the std::fstream object.

The std::fstream objects are designed to be very effecient for general purpose file access. It does not sound like you are doing anything special by accessing the data 4 bytes at a time. You can always profile your code to see where the actual time is spent in your code.

Maybe if you share the code with ous we could spot some major inefficiencies.

Edit:

I don't like your algorithm. Seeking back and forward may be hard on the stream especially of the number lies over a buffer boundary. I would only read one number each time through the loop.

Try this:
Note: This is not optimal (and it assumes stream input of numbers (while yours looks binary)) But I am sure you can use it as a starting point.

#include <fstream>
#include <iostream>

// Return the current val (that was the smaller value)
// and replace it with the next value in the stream.
int getNext(int& val, std::istream& str)
{
    int result = val;
    str >> val;

    return result;
}

int main()
{
    std::ifstream   f1("f1.txt");
    std::ifstream   f2("f2.txt");
    std::ofstream   re("result");

    int v1;
    int v2;

    f1 >> v1;
    f2 >> v2;

    // While there are values in both stream
    // Output one value and replace it using getNext()
    while(f1 && f2)
    {
        re << (v1 < v2)? getNext(v1, f1) : getNext(v2, f2);
    }
    // At this point one (or both) stream(s) is(are) empty.
    // So dump the other stream.
    for(;f1;f1 >> v1)
    {
        // Note if the stream is at the end it will
        // never enter the loop
        re << v1;
    }
    for(;f2;f2 >> v2)
    {
        re << v2;
    }
}
Community
  • 1
  • 1
Martin York
  • 257,169
  • 86
  • 333
  • 562
  • 3
    @Martin, are you trying to be funny? Several widely-used implementations of `fstream` (including gcc's and Visual C++'s) are known to have terrible performance. – Ben Voigt Dec 29 '10 at 22:22
  • The stream operators << and >> have bad performance. Not the stream. – Martin York Dec 29 '10 at 22:37
  • @Martin. `fstream` has bad performance. I changed one of my programs from looping `ofstream.write`(4 bytes) to a custom buffer and `WriteFile` and saw a 10x improvement. There are similar inefficiencies on read, although I must admit that after seeing the problem with `ofstream.write`, I changed the read code straight to OS-provided means without an interim `ifstream.read` version. – Ben Voigt Dec 29 '10 at 22:41
  • 1
    @Ben: If only you had shared your technique with the world (maybe in a Blog or Journal) we could have updated the STL to use the BenBuffer and the problems would have gone away for everybody. – Martin York Dec 29 '10 at 22:47
  • Though I do agree that reading/writing directly from/to the file descriptor and writing your own buffer can potentially be faster the extra work is usually not worth the extra gain (though a 10x would be worth it). – Martin York Dec 29 '10 at 22:49
  • @Martin: Actually this is a test solution as part of the interview, so I wouldn't like to do some fantastic things to make it faster, if it works like it should I don't want to improve ifstream's performance, I just want to be sure that it is not a obviously BAD solution. – ledokol Dec 30 '10 at 05:27
  • @ledokol: If you don't want to improve the performance, why did you ask "will it give me more performance?"? – Ben Voigt Dec 30 '10 at 07:47
  • @Ben: I want to improve, I just don't want to make something incredible from it by using memory mapped files, API's and other non-portable things, code has to be portable, that's the task. Currently it works VERY slow, I believe there is something wrong with principles, I'm quite sure it's possible to make it in pure C++, maybe this will not be the fastest possible code, but still good enough. – ledokol Dec 30 '10 at 08:10
0

You can just use the read function of an ifstream to read large blocks.

http://www.cplusplus.com/reference/iostream/istream/read/

The second parameter is the number of bytes. You should make this a multiple of 4 in your case - maybe 4096? :)

Simply read a chunk at a time and work on it.

As martin-york said, this may not have any beneficial effect on your performance, but try it and find out.

Hank
  • 547
  • 3
  • 4
0

I think it is very likely that you can improve performance by reading big chunks.

Try opening the file with ios::binary as an argument, then use istream::read to read the data.

If you need maximum performance, I would actually suggest skipping iostreams altogether, and using cstdio instead. But I guess this is not what you want.

Johan Kotlinski
  • 25,185
  • 9
  • 78
  • 101
  • @Martin: The built-in buffer does do a decent job of minimizing syscalls. But the buffer management code is (at least in Visual C++ and g++ versions of the library) so horribly inefficient it really doesn't matter. – Ben Voigt Dec 29 '10 at 22:45