1

My intention is to use a generic interface for iterating over files from a variety of I/O sources. For example, I might want an iterator that, authorization permitting, will lazily open every file on my file system and return the open file handle. I'd then want to use the same interface for iterating over, perhaps, objects from an AWS S3 bucket. In this latter case, the iterator would download each object/file from S3 to the local file system, then open that file, and again return a file handle. Obviously the implementation behind both iterator interfaces would be very different.

I believe the three most important design goals are these:

  • For each iter++ invocation, a std::future or PPL pplx::task is returned representing the requested file handle. I need the ability to do the equivalent of the PPL choice(when_any), because I expect to have multiple iterators running simultaneously.
  • The custom iterator implementation must be durable / restorable. That is, it periodically records where it is in a file system scan (or S3 bucket scan, etc.) so that it can attempt to resume scanning from the last known position in the event of an application crash and restart.
  • Best effort to not go beyond C++11 (and possibly C++14).

I'd assume to make the STL input_iterator my point of departure for an interface. After all, I see this 2014 SO post with a simple example. It does not involve IO, but I see another article from 2001 that allegedly does incorporate IO into a custom STL iterator. So far so good.

Where I start to get concerned is when I read an article like "Generator functions in C++". Ack! That article gives me the impression that I can't achieve my intent to create a generator function, disguised as an iterator, possibly not without waiting for C++20. Likewise, this other 2016 SO post makes it sound like it is a hornets-nest to create generator functions in C++.

While the implementation for my custom iterators will be complex, perhaps what those last two links were tackling was something beyond what I'm trying to achieve. In other words, perhaps my plan is not flawed? I'd like to know what barriers I'm fighting if I assume to make a lazy-generator implementation behind a custom input_iterator. If I should be using something else, like Boost iterator_facade, I'd appreciate a bit of explanation around "why". Also, I'd like to know if what I'm doing has already been implemented elsewhere. Perhaps the PPL, which I've only just started to learn, already has a solution for this?

p.s. I gave the example of an S3 iterator that lazily downloads each requested file and then returns an open file handle. Yes I know this means the iterator is producing a side effect, which normally I would want to avoid. However, for my intended purpose, I'm not sure of a more clean way to do this.

Brent Arias
  • 29,277
  • 40
  • 133
  • 234

1 Answers1

3

Have you looked at CoroutineTS? It is coming with C++20 and allows what you are looking for.

Some compilers (GNU 10, MSVC) already have some support.

Specific library features on top of standard coroutines that may interest you:

  • generator<T>

    cppcoro::generator<const std::uint64_t> fibonacci()
    {
      std::uint64_t a = 0, b = 1;
      while (true)
      {
        co_yield b;
        auto tmp = a;
        a = b;
        b += tmp;
      }
    }
    
    void usage()
    {
      for (auto i : fibonacci())
      {
        if (i > 1'000'000) break;
        std::cout << i << std::endl;
      }
    }
    

    A generator represents a coroutine type that produces a sequence of values of type, T, where values are produced lazily and synchronously.

    The coroutine body is able to yield values of type T using the co_yield keyword. Note, however, that the coroutine body is not able to use the co_await keyword; values must be produced synchronously.

  • async_generator<T>

    An async_generator represents a coroutine type that produces a sequence of values of type, T, where values are produced lazily and values may be produced asynchronously.

    The coroutine body is able to use both co_await and co_yield expressions.

    Consumers of the generator can use a for co_await range-based for-loop to consume the values.

    Example

    cppcoro::async_generator<int> ticker(int count, threadpool& tp)
    {
      for (int i = 0; i < count; ++i)
      {
        co_await tp.delay(std::chrono::seconds(1));
        co_yield i;
      }
    }
    
    cppcoro::task<> consumer(threadpool& tp)
    {
      auto sequence = ticker(10, tp);
      for co_await(std::uint32_t i : sequence)
      {
        std::cout << "Tick " << i << std::endl;
      }
    }
    

Sidenote: Boost Asio has experimental support for CoroutineTS for several releases, so if you want you can combine it.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • This is cool stuff, but it ignores my goal: "best effort to not go beyond C++11". I'm reading [this article](https://blog.panicsoftware.com/coroutines-introduction/) and re-reading the ["Generator Functions in C++"](https://paoloseverini.wordpress.com/2014/06/09/generator-functions-in-c/) article and I think I'm concluding that when contributors say we "need" language support, what they mean is that language support is needed to have the semantic parity and elegance of languages like C#. I don't *need* the elegance. So I'm still looking to understand the cleanest C++11 way to do this. – Brent Arias Apr 24 '20 at 17:44