Generator called twice in C++20 views pipeline

Question

Here in a simple pipeline of views adaptors, there is the gen function called to generate a sequence of values (using an internal state) and then a filter on it.

What is surprising and counterintuitive (at least for me) is the fact that the generator function is called twice at each iteration, so that the next check on the same filter fails (the filtered value is not reused in the pipeline).

Do you have an idea if this is the correct expected behavior (and why)?

Tested with libstdc++ in GCC 10.3, 11.1 & trunk (code) and range-v3 with GCC & clang (code).

int main() {
  int n = 0;
  auto gen = [&n]() {
    auto result = ++n;
    std::cout << "Generate [" << result << "]\n";
    return result;
  };

  auto tmp =
      ranges::views::iota(0)
      | ranges::views::transform([gen](auto &&) { return gen(); })
      | ranges::views::filter([](auto &&i) {
          std::cout << "#1 " << i << " " << (i % 2) << "\n";
          return (i % 2) == 1;
        });

  for (auto &&i : tmp | ranges::views::take(1)) {
    std::cout << "#2 " << i << " " << ((i % 2) == 1) << "\n";
    assert(((i % 2) == 1));
  }
}

NB: if gen function is written as mutable with an internal state, it does not compile:

  auto gen = [n=0]() mutable {
    auto result = ++n;
    std::cout << "Generate [" << result << "]\n";
    return result;
  };

(and I know that pure functions are better)

It's all well explained [here](https://www.fluentcpp.com/2019/02/15/how-smart-output-iterators-fare-with-the-terrible-problem-of-incrementing-a-smart-iterator/). Another related question [here](https://stackoverflow.com/questions/36820639/how-do-i-write-a-range-pipeline-that-uses-temporary-containers#comment70900995_36976023). — Enlico, Apr 29 '21 at 16:58
@Enlico "How Smart Output Iterators Avoid the TPOIASI" <-- honestly I'm having trouble getting past the the point where the author decided to turn a phrase, which isn't remotely commonly used, into an initialism, for no reason (and also who calls them "smart iterators" anyway?) — Barry, Apr 29 '21 at 17:02
@Barry, well, I can't say _TPOIASI_ is appealing as an acronym, nor that _smart iterators_ is a well known name for those things wrapped in a range. And probably given the _P_, they are not that _S_, after all. But still, I find the explanation in that post is clear enough. Furthermore, given the short time it took for you to write down your good answer, I guess you can get past that point till the end in a matter of seconds, if you want :D — Enlico, Apr 29 '21 at 17:05

Barry · Accepted Answer · 2021-04-29T17:43:00.167

Do you have an idea if this is the correct expected behavior (and why)?

Yes: this is the expected behavior. It is an inherent property of the iteration model where we have operator* and operator++ as separate operations.

filter's operator++ has to look for the next underlying iterator that satisfies the predicate. That involves doing *it on transform's iterator which involves invoking the function. But once we find that next iterator, when we read it again, that will again invoke the transform. In a code snippet:

decltype(auto) transform_view<V, F>::iterator::operator*() const {
    return invoke(f_, *it_);
}

decltype(auto) filter_view<V, P>::iterator::operator*() const {
    // reading through the filter iterator just reads
    // through the underlying iterator, which in this
    // case means invoking the function
    return *it_;
}

auto filter_view<V, P>::iterator::operator++() -> iterator& {
    for (++it_; it_ != ranges::end(parent_->base_); ++it_) {
        // when we eventually find an iterator that satisfies this
        // predicate, we will have needed to read it (which calls
        // the functions) and then the next operator* will do
        // that same thing again
        if (invoke(parent_->pred_, *it_))) {
            break;
        }
    }
    return *this;
}

The result is that we invoke the function twice on every element that satisfies the predicate.

The workaround is to either just not care (have the transform be either cheap enough that invoking it twice doesn't matter or the filter be sufficiently rare that the amount of duplicate transforms don't matter or both) or do add a caching layer into your pipeline.

There's no caching view in C++20 Ranges, but there is one in range-v3 named views::cache1:

ranges::views::iota(0)
    | ranges::views::transform(f)
    | ranges::views::cache1
    | ranges::views::filter(g)

This ensures that f only gets invoked at most once per element, at the cost of having to deal with an element cache and downgrading your range to only be an input range (where before it was bidirectional).

Generator called twice in C++20 views pipeline

1 Answers1

Linked

Related