12

I need an input file stream which would have a bidirectional iterator/adapter.

Unfortunately std::ifstream (and similar) can be used only with std::istream_iterator which is a kind of forward iterator which cannot go backwards. (or am I mistaken here?)

I could simply load the whole file to memory and then use a much more powerful random-access iterator over the array; however I would like to avoid that, and read only as much as I really need. It may happen that I really need only a small portion of a file.

I could somehow do it manually using C stdio.h functions, but that will be painful. I would basically need to implement a bidirectional iterator, with all its specification in mind, by hand.

I am considering looking into boost iostream library, but the manual is somewhat overwhelming, I was hoping someone could give me a hand to achieve this particular goal? Or maybe there is another already existing library to do exactly what I need?

I need the iterator for the boost xpressive library to parse my files, which expects that the iterator can be incremented as well as decremented. I would be fine if the file I am reading is buffered, although this is not a requirement.

Any ideas? Thank you!

CygnusX1
  • 20,968
  • 5
  • 65
  • 109
  • 1
    Are you sure you need a bidirectional iterator? If a forward iterator will suffice, [Boost.Spirit](http://www.boost.org/libs/spirit/) has you covered: [Supporting Libraries -> The multi pass iterator](http://www.boost.org/libs/spirit/doc/html/spirit/support/multi_pass.html). – ildjarn Jan 26 '12 at 23:09
  • Can you not buffer a part of the file, do your operations on it, write it to a temp file, then get the next part of the file, etc... ?? – Tony The Lion Jan 26 '12 at 23:16
  • 1
    I take it you can't just memory map the file? Less portable of course, but it gives you random access *and* it only reads the parts of the file you really need (well, the neighborhoods of those parts rounded up to some page size). – Steve Jessop Jan 26 '12 at 23:33
  • @Steve : That's exactly what I do in my code, and portability is less of a concern given that [Boost.Interprocess](http://www.boost.org/libs/interprocess/) has memory-mapped-files facilities. – ildjarn Jan 26 '12 at 23:34
  • 1
    If you could iterate bidirectionally, it *wouldn't be a stream*. – Karl Knechtel Jan 26 '12 at 23:47
  • @Karl : That doesn't invalidate the need of the OP. – ildjarn Jan 26 '12 at 23:48
  • @ildjarn: Check this: http://www.boost.org/doc/libs/1_48_0/doc/html/boost/xpressive/regex_token_iterator.html -- I believe that `BidiIter` denotes that it has to be a bidirectional iterator. I think I have read it somewhere, but the manual is so hard to navigate :( I tried using something weaker but then I got a ton of incomprehensive compiler error messages. – CygnusX1 Jan 27 '12 at 07:23

2 Answers2

6

If I were to traverse a file in the wrong direction, I would start off questioning my requirements. This seems to be a contrived way to do things and most likely something got messed up dramatically somewhere.

Once I confirmed that this is indeed the requirement I would realize that we are definitely talking files here, rather than e.g. a named pipe or a socket. That is, it would be possible memory map at least parts of the file. I would use this to create an iterator which walks the memory. Since obviously an iterator is needed, there is no need to involve streams. If streams were needed too, I would still memory map the file and reverse buffers from the back in a custom stream buffer.

When actually reading from the start and just needing to be able to move backwards when necessary, it may be simpler than this: keeping a buffer of the already read data and expending it when moving off the end plus possibly reading the entire file if the end iterator is used to move backwards should address this. Here is code which certainly can read a file forward and backward but isn't thoroughly tested:

#include <iostream>
#include <fstream>
#include <algorithm>
#include <iterator>
#include <limits>
#include <vector>

class bidirectional_stream
{
public:
    class                                         iterator;
    typedef iterator                              const_iterator;
    typedef std::reverse_iterator<iterator>       reverse_iterator;
    typedef std::reverse_iterator<const_iterator> const_reverse_iterator;

    bidirectional_stream(std::istream& in):
        in_(in)
    {
    }
    iterator         begin();
    iterator         end();
    reverse_iterator rbegin();
    reverse_iterator rend();

    bool expand()
    {
        char buffer[1024];
        this->in_.read(buffer, sizeof(buffer));
        this->buffer_.insert(this->buffer_.end(), buffer, buffer + this->in_.gcount());
        return 0 < this->in_.gcount();
    }
    long read_all()
    {
        this->buffer_.insert(this->buffer_.end(),
                             std::istreambuf_iterator<char>(this->in_),
                             std::istreambuf_iterator<char>());
        return this->buffer_.size();
    }
    char get(long index) { return this->buffer_[index]; }
    long current_size() const { return this->buffer_.size(); }
private:
    std::istream&     in_;
    std::vector<char> buffer_;
};

class bidirectional_stream::iterator
{
public:
    typedef char                            value_type;
    typedef char const*                     pointer;
    typedef char const&                     reference;
    typedef long                            difference_type;
    typedef std::bidirectional_iterator_tag iterator_category;

    iterator(bidirectional_stream* context, size_t pos):
        context_(context),
        pos_(pos)
    {
    }

    bool operator== (iterator const& other) const
    {
        return this->pos_ == other.pos_
            || (this->pos_ == this->context_->current_size()
                && !this->context_->expand()
                && other.pos_ == std::numeric_limits<long>::max());
    }
    bool operator!= (iterator const& other) const { return !(*this == other); }
    char      operator*() const { return this->context_->get(this->pos_); }
    iterator& operator++()    { ++this->pos_; return *this; }
    iterator  operator++(int) { iterator rc(*this); this->operator++(); return rc; }
    iterator& operator--()
    {
        if (this->pos_ == std::numeric_limits<long>::max())
        {
            this->pos_ = this->context_->read_all();
        }
        --this->pos_;
        return *this;
    }
    iterator  operator--(int) { iterator rc(*this); this->operator--(); return rc; }

private:
    bidirectional_stream* context_;
    long                  pos_;
};

bidirectional_stream::iterator bidirectional_stream::begin()
{
    return iterator(this, 0);
}
bidirectional_stream::iterator bidirectional_stream::end()
{
    return iterator(this, std::numeric_limits<long>::max());
}

bidirectional_stream::reverse_iterator bidirectional_stream::rbegin()
{
    return reverse_iterator(this->end());
}
bidirectional_stream::reverse_iterator bidirectional_stream::rend()
{
    return reverse_iterator(this->begin());
}

Just create a bidirectional_stream with the stream you want to read as an argument and then use the begin() and end() methods to actually access it.

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • "*If I were to traverse a file in the wrong direction, I would start off questioning my requirements.*" These are requirements of [Boost.Xpressive](http://www.boost.org/libs/xpressive/), not the direct requirements of the OP. – ildjarn Jan 26 '12 at 23:45
  • Then the original author is using Boost.Xpressive on the wrong things! Reading a file the wrong way around isn't going to be too efficient. Whether it is to be used for some library is entirely besides the point: the requirement to read the back of the file the wrong order seems to be there and I would question that to be necessary. – Dietmar Kühl Jan 26 '12 at 23:50
  • 3
    Sorry, but running a regex over file contents does not sound like an outlandish thing to do. Using a file stream as input to the regex may be misguided, but the overall goal is certainly not questionable. – ildjarn Jan 27 '12 at 01:35
  • Perhaps using a "file stream" for xpressive is a wrong thing. Then what can I use instead? I need a solution to my problem, not a reason why I cannot do it! btw. a named pipe or a socket can be buffered too and then iterated over in both directions. – CygnusX1 Jan 27 '12 at 06:15
  • "The code for any of this isn't hard to write but I'd need to chase through the various interfaces myself as I only rarely use memory mapped files explicitly. I got a couple of classes which access memory mapped data, though." --- it may be non trivial if you want to create a new iterator and be correct with its interface, so that you can use it with 3-rd party library. – CygnusX1 Jan 27 '12 at 06:42
  • I had missed the target not being to read the file from the back to the front but rather passing iterators to some other library which requires possibly going backwards over the already read portion. It probably won't start reading from the back. A simple adapter expanding an internal buffer should do this trick. – Dietmar Kühl Jan 27 '12 at 08:00
  • Thank you very much! I will definitely go and try your code! I agree with you that conceptually it is a "simple adapter", but practically, to make sure everything matches just right, that the requirements are met, is not straightforward (at least for me). – CygnusX1 Jan 27 '12 at 09:47
  • 2
    Just for the record: creating an iterator should be simple. It is definitely something doable when creating a new iterator. Generic iterator adopters are a different kind of kettle and things a different there when the values referenced push boundaries. That said, I also believe that the iterators are not quite the correct abstraction as they combine two unrelated aspects (property access and positioning) into one abstraction although the two aspects can vary independently. – Dietmar Kühl Jan 27 '12 at 10:03
  • OK, I will probably go by this longer implementation which is more generic, rather than the boost solution suggested by Cubbi, because this is more generic and should work with any stream. I am just a bit worried about the `rbegin()`. It looks like a function that you never want to use :) (and I have no plans to do so) – CygnusX1 Feb 01 '12 at 07:19
4

Since you're already using boost, take a look at boost::iostreams::mapped_file_source http://www.boost.org/doc/libs/release/libs/iostreams/doc/classes/mapped_file.html#mapped_file_source

You can use file.data() as the begin iterator and file.data() + file.size() as the end iterator.

Cubbi
  • 46,567
  • 13
  • 103
  • 169
  • I will definetely look into the mapped_file_source. It would be really helpful, if you could provide me an example of how to use it to achieve what I need? – CygnusX1 Jan 27 '12 at 06:39
  • @CygnusX1 Just write `boost::iostreams::mapped_file_source file("whatever.bin");` and you have your file, which you can iterate in reverse, e.g. `reverse_copy(file.data(), file.data()+file.size(), std::ostreambuf_iterator(std::cout));` or in fact, access randomly. – Cubbi Jan 27 '12 at 11:17
  • Er, gotta make_iterator_range before you can reverse_copy from a pair of pointers, wasn't the best example. – Cubbi Jan 27 '12 at 13:19