My intention is to use a generic interface for iterating over files from a variety of I/O sources. For example, I might want an iterator that, authorization permitting, will lazily open every file on my file system and return the open file handle. I'd then want to use the same interface for iterating over, perhaps, objects from an AWS S3 bucket. In this latter case, the iterator would download each object/file from S3 to the local file system, then open that file, and again return a file handle. Obviously the implementation behind both iterator interfaces would be very different.
I believe the three most important design goals are these:
- For each
iter++
invocation, a std::future or PPL pplx::task is returned representing the requested file handle. I need the ability to do the equivalent of the PPLchoice(when_any)
, because I expect to have multiple iterators running simultaneously. - The custom iterator implementation must be durable / restorable. That is, it periodically records where it is in a file system scan (or S3 bucket scan, etc.) so that it can attempt to resume scanning from the last known position in the event of an application crash and restart.
- Best effort to not go beyond C++11 (and possibly C++14).
I'd assume to make the STL input_iterator my point of departure for an interface. After all, I see this 2014 SO post with a simple example. It does not involve IO, but I see another article from 2001 that allegedly does incorporate IO into a custom STL iterator. So far so good.
Where I start to get concerned is when I read an article like "Generator functions in C++". Ack! That article gives me the impression that I can't achieve my intent to create a generator function, disguised as an iterator, possibly not without waiting for C++20. Likewise, this other 2016 SO post makes it sound like it is a hornets-nest to create generator functions in C++.
While the implementation for my custom iterators will be complex, perhaps what those last two links were tackling was something beyond what I'm trying to achieve. In other words, perhaps my plan is not flawed? I'd like to know what barriers I'm fighting if I assume to make a lazy-generator implementation behind a custom input_iterator. If I should be using something else, like Boost iterator_facade, I'd appreciate a bit of explanation around "why". Also, I'd like to know if what I'm doing has already been implemented elsewhere. Perhaps the PPL, which I've only just started to learn, already has a solution for this?
p.s. I gave the example of an S3 iterator that lazily downloads each requested file and then returns an open file handle. Yes I know this means the iterator is producing a side effect, which normally I would want to avoid. However, for my intended purpose, I'm not sure of a more clean way to do this.