C++/seek - which order of file seek is faster?

Question

I'm writing a c++ method that needs to update some chars in an open file (ofstream).
The method gets as an input a map, where the key is an offset (position in a file) and the value is a char.

Code sample

typedef map<int,char> IntChar_map;

void update_file(const IntChar_map& v)
{
    for(IntChar_map::const_iterator it = v.begin(); it != v.end(); ++it)
    {
        m_stream->seekp(it->first); 
        m_stream->put(it->second);  
    }
}

Question

Let's assume the file is large and the offsets in the map are random.
If I iterate through the map in a reverse order, will it increase performance?

Thanks.

If you map _random_ values, then there is no way to tell in which case you fall. If you can make an assumption on the input data distribution, then there may be a definite answer. — Stefano Sanfilippo, Nov 07 '13 at 07:56

score 3 · Accepted Answer · edited May 23 '17 at 12:13

3

The map iterators are ordered, so your file I/O is localised and can take advantage of buffering. If you go through the map in reverse, the offsets are still ordered and thus localised, and so you get similar buffering effects.

The best way to find out is to do some tests and compare their times.

For small writes with seeks, you might find that file buffering gives worse performance, and you might want to turn it off. To to this, you can do:

m_stream.setbuf(0, 0);

I did some comprehensive tests on the C file I/O functions when I was doing lots of small, random writes, and I discovered that using pure unbuffered I/O was significantly faster. Here is a link to my question, in case it is of use to you:

What goes on behind the curtains during disk I/O?

Again, I stress the importance of benchmarking a typical scenario using different coding approaches if performance is critical.

edited May 23 '17 at 12:13

Community

1
1

answered Nov 07 '13 at 07:53

paddy

60,864
6
61
103

I have fixed this answer. I realised I had misinterpreted what you were asking because you talked about random values. However, the map iterators are ordered, so you may have buffering locality advantages (with the caveat that these are not always advantageous for seek-writes). It depends on how sparsely distributed those offsets are. – paddy Nov 07 '13 at 08:02
thanks for the detailed answer. I assumed the order of seeks would not matter, but since I'm not and I/O expert I thought there might be some mysterious reason why one order is more efficient then the other. – idanshmu Nov 07 '13 at 08:06
Well, now I think about it, I might expect the forward ordering to be slightly better. I think most file buffering schemes are designed to deal with moving forwards through a file, so depending on how your `ofstream` is implemented, it might have to do a bit of extra work when going backwards. It's hard to give a definite answer though. You really should test it. I certainly wouldn't expect reverse to be better than forwards. But it's possible that a completely unbuffered stream will be better than buffered stream. – paddy Nov 07 '13 at 08:10
I'll definitely stress test it. I was more curious about the theoretical explanation on why one order is better then the other and you provided some interesting insights for me. – idanshmu Nov 07 '13 at 08:15

C++/seek - which order of file seek is faster?

1 Answers1