How do I write a range pipeline that uses temporary containers?

Question

I have a third-party function with this signature:

std::vector<T> f(T t);

I also have an existing potentially infinite range (of the range-v3 sort) of T named src. I want to create a pipeline that maps f to all elements of that range and flattens all the vectors into a single range with all their elements.

Instinctively, I would write the following.

 auto rng = src | view::transform(f) | view::join;

However, this ~~won't work~~ didn't use to work, because we ~~cannot~~ couldn't create views of temporary containers.

UPDATE: This issue has been patched by this commit.

How does range-v3 support such a range pipeline?

Apologies for the incredibly late answer, I had forgotten this question existed. — Casey, Oct 01 '16 at 19:09
The hacks below shouldn't really be necessary. I have my own ranges library which solves the problem. If when composing the expression tree the source containers are passed as lvalue then a reference is stored. If the source container is passed as rvalue then it is moved into the expression tree. All the nodes of the expression tree support proper move semantics so as long as you build up the expression tree without a real copy all you get are moves of the source container. I started out with exactly the same problem that range-v3 has but it is solvable. — bradgonesurfing, Feb 26 '20 at 11:43
A couple of unit tests showing that lvalues and rvalues are handled correctly. Unfortunately I can't share more because it's part of a proprietry code base :( https://gist.github.com/bradphelan/0bb9397ea7b49f45122908b1a9da061f — bradgonesurfing, Feb 26 '20 at 12:06
I actually came here looking for the answer to a slightly different question, which I mistook for this question. In my case, I have a temporary at the beginning of my pipeline: `for (int i : std::vector({1,2,3}) | mytransform([](int i) { return i+1; })) { std::cout << i << std::endl; }`. My mytransform function was giving garbage or crashing because the `std::vector` gets prematurely deleted (the lifetime of the temporary rhs of a range-based-for-loop gets extended to the end of the loop, but any other temps don't get extended:-( ) @bradgonesurfing 's comments above solve this brilliantly. — Don Hatch, Apr 16 '20 at 12:10
Non-obvious caveat: in the move case, the moved src container should go into a shared_ptr so that all copies of the view share the source and copying the view is O(1). — Don Hatch, Apr 16 '20 at 12:12
Odd. I cloned range-v3 yesterday but I still get this error, in spite of the update saying this has been fixed. I am on clang. — user118967, Oct 16 '22 at 05:11
@DonHatch: I am surprised you managed to compile your example with a temporary at the beginning of the pipe. clang generates an error when I try to do that. — user118967, Oct 16 '22 at 05:11

score 20 · Answer 1 · edited Nov 20 '20 at 13:20

20

It looks like there are now test cases in the range-v3 library that show how to do this correctly. It is necessary to add the views::cache1 operator into the pipeline:

auto rng = views::iota(0,4)
        | views::transform([](int i) {return std::string(i, char('a'+i));})
        | views::cache1
        | views::join('-');
check_equal(rng, {'-','b','-','c','c','-','d','d','d'});
CPP_assert(input_range<decltype(rng)>);
CPP_assert(!range<const decltype(rng)>);
CPP_assert(!forward_range<decltype(rng)>);
CPP_assert(!common_range<decltype(rng)>);

so the solutions for the OP's question would be to write

auto rng = src | views::transform(f) | views::cache1 | views::join;

edited Nov 20 '20 at 13:20

einpoklum

118,144
57
340
684

answered Feb 26 '20 at 14:01

bradgonesurfing

30,949
17
114
217

7

Yes, this is the correct answer today. (I'm the author of range-v3.) For anybody wondering, `views::cache1` caches the most recently produced element of the range within the view object itself. Dereferencing the iterator returns a reference to the cached object. That gives it a stable address, so `views::join` can iterate over it without it dangling. – Eric Niebler Jun 10 '20 at 17:40
5

Is there anyway to get this behaviour for free? It does seem smelly. Perhaps ranges can be augmented with a trait/concept that says if they are pure ie: iterating over them twice gives the same result. Then downstream combinators can decide to cache or not. – bradgonesurfing Jun 11 '20 at 16:13

Casey · Answer 2 · 2016-10-09T19:14:06.950

range-v3 forbids views over temporary containers to help us avoid the creation of dangling iterators. Your example demonstrates exactly why this rule is necessary in view compositions:

auto rng = src | view::transform(f) | view::join;

If view::join were to store the begin and end iterators of the temporary vector returned by f, they would be invalidated before ever being used.

"That's all great, Casey, but why don't range-v3 views store temporary ranges like this internally?"

Because performance. Much like how the performance of the STL algorithms is predicated on the requirement that iterator operations are O(1), the performance of view compositions is predicated on the requirement that view operations are O(1). If views were to store temporary ranges in internal containers "behind your back" then the complexity of view operations - and hence compositions - would become unpredictable.

"Ok, fine. Given that I understand all of this wonderful design, how do I MAKE THIS WORK?!??"

Since the view composition won't store the temporary ranges for you, you need to dump them into some kind of storage yourself, e.g.:

#include <iostream>
#include <vector>
#include <range/v3/range_for.hpp>
#include <range/v3/utility/functional.hpp>
#include <range/v3/view/iota.hpp>
#include <range/v3/view/join.hpp>
#include <range/v3/view/transform.hpp>

using T = int;

std::vector<T> f(T t) { return std::vector<T>(2, t); }

int main() {
    std::vector<T> buffer;
    auto store = [&buffer](std::vector<T> data) -> std::vector<T>& {
        return buffer = std::move(data);
    };

    auto rng = ranges::view::ints
        | ranges::view::transform(ranges::compose(store, f))
        | ranges::view::join;

    unsigned count = 0;
    RANGES_FOR(auto&& i, rng) {
        if (count) std::cout << ' ';
        else std::cout << '\n';
        count = (count + 1) % 8;
        std::cout << i << ',';
    }
}

Note that the correctness of this approach depends on the fact that view::join is an input range and therefore single-pass.

"This isn't novice-friendly. Heck, it isn't expert-friendly. Why isn't there some kind of support for 'temporary storage materialization™' in range-v3?"

Because we haven't gotten around to it - patches welcome ;)

The answer now is to use `views::cache1`, as user bradgonesurfing shows in his answer below. — Eric Niebler, Jun 10 '20 at 17:18

score 12 · Answer 3 · answered May 01 '16 at 21:25

I suspect it just can't. None of the views have any machinery to store temporaries anywhere - that's explicitly against the concept of view from the docs:

A view is a lightweight wrapper that presents a view of an underlying sequence of elements in some custom way without mutating or copying it. Views are cheap to create and copy, and have non-owning reference semantics.

So in order for that join to work and outlive the expression, something somewhere has to hold onto those temporaries. That something could be an action. This would work (demo):

auto rng = src | view::transform(f) | action::join;

except obviously not for src being infinite, and even for finite src probably adds too much overhead for you to want to use anyway.

You would probably have to copy/rewrite view::join to instead use some subtly modified version of view::all (required here) that instead of requiring an lvalue container (and returning an iterator pair into it), allowed for an rvalue container that it would store internally (and returning an iterator pair into that stored version). But that's several hundred lines' worth of copying code, so seems pretty unsatisfactory, even if that works.

The range library now has a clean solution for this sort of problem: `views::cache1`. See bradgonesurfing's answer. — Eric Niebler, Jun 10 '20 at 17:28

ptrj · Answer 4 · 2016-05-05T01:26:54.260

Edited

Apparently, the code below violates the rule that views cannot own data they refer to. (However, I don't know if it's strictly forbidden to write something like this.)

I use ranges::view_facade to create a custom view. It holds a vector returned by f (one at a time), changing it to a range. This makes it possible to use view::join on a range of such ranges. Certainly, we can't have a random or bidirectional access to elements (but view::join itself degrades a range to an Input range), nor can we assign to them.

I copied struct MyRange from Eric Niebler's repository modifying it slightly.

#include <iostream>
#include <range/v3/all.hpp>

using namespace ranges;

std::vector<int> f(int i) {
    return std::vector<int>(static_cast<size_t>(i), i);
}

template<typename T>
struct MyRange: ranges::view_facade<MyRange<T>> {
private:
    friend struct ranges::range_access;
    std::vector<T> data;
    struct cursor {
    private:
        typename std::vector<T>::const_iterator iter;
    public:
        cursor() = default;
        cursor(typename std::vector<T>::const_iterator it) : iter(it) {}
        T const & get() const { return *iter; }
        bool equal(cursor const &that) const { return iter == that.iter; }
        void next() { ++iter; }
        // Don't need those for an InputRange:
        // void prev() { --iter; }
        // std::ptrdiff_t distance_to(cursor const &that) const { return that.iter - iter; }
        // void advance(std::ptrdiff_t n) { iter += n; }
    };
    cursor begin_cursor() const { return {data.begin()}; }
    cursor   end_cursor() const { return {data.end()}; }
public:
    MyRange() = default;
    explicit MyRange(const std::vector<T>& v) : data(v) {}
    explicit MyRange(std::vector<T>&& v) noexcept : data (std::move(v)) {}
};

template <typename T>
MyRange<T> to_MyRange(std::vector<T> && v) {
    return MyRange<T>(std::forward<std::vector<T>>(v));
}


int main() {
    auto src = view::ints(1);        // infinite list

    auto rng = src | view::transform(f) | view::transform(to_MyRange<int>) | view::join;

    for_each(rng | view::take(42), [](int i) {
        std::cout << i << ' ';
    });
}

// Output:
// 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 9 9 9 9 9 9

Compiled with gcc 5.3.0.

Your type claims to satisfy the `View` concept, so it compiles. However, since it does not have O(1) copy, this type does not meet the complexity requirement of the `View` concept, so it is not /actually/ a `View`. In practice, this means that folks will reason incorrectly about the performance of pipelines that contain `MyRange`. — Eric Niebler, Jan 25 '17 at 18:06
@ptrj, fwiw, as new user of Range-v3, [I've stumbled against this limitation pretty soon](https://stackoverflow.com/questions/66284017/how-to-wrap-some-computed-values-in-a-range-in-order-to-allow-using-join-on-the). [Your post on Fluent C++](https://www.fluentcpp.com/2019/09/13/the-surprising-limitations-of-c-ranges-beyond-trivial-use-cases/) is very interesting in this respect. — Enlico, Mar 07 '21 at 09:56
@Enlico I'm not an author of the Fluen C++ post. We just happened to have a similar idea or a "hack." — ptrj, Mar 20 '21 at 17:13
Strange, @ptrj, the FluentC++ post has a link exactly to Eric Niebler's comment above. What a coincidence! — Enlico, Mar 20 '21 at 20:49

Ap31 · Answer 5 · 2016-09-21T13:46:31.017

The problem here of course is the whole idea of a view - a non-storing layered lazy evaluator. To keep up with this contract, views have to pass around references to range elements, and in general they can handle both rvalue and lvalue references.

Unfortunately in this specific case view::transform can only provide an rvalue reference as your function f(T t) returns a container by value, and view::join expects an lvalue as it tries to bind views (view::all) to inner containers.

Possible solutions will all introduce some kind of temporary storage somewhere into the pipeline. Here are the options I came up with:

Create a version of view::all that can internally store a container passed by an rvalue reference (As suggested by Barry). From my point of view, this violates the "non-storing view" conception and also requires some painful template coding so I would suggest against this option.
Use a temporary container for the whole intermediate state after the view::transform step. Can be done either by hand:
```
auto rng1 = src | view::transform(f)
vector<vector<T>> temp = rng1;
auto rng = temp | view::join;
```
Or using action::join. This would result in "premature evaluation", will not work with infinite src, will waste some memory, and overall has a completely different semantics from your original intention, so that is hardly a solution at all, but at least it complies with view class contracts.
Wrap a temporary storage around the function you pass into view::transform. The simpliest example is
```
const std::vector<T>& f_store(const T& t)
{
  static std::vector<T> temp;
  temp = f(t);
  return temp;
}
```
and then pass f_store to the view::transform. As f_store returns an lvalue reference, view::join will not complain now.

This of course is somewhat of a hack and will only work if you then streamline the whole range into some sink, like an output container. I believe it will withstand some straightforward transformations, like view::replace or more view::transforms, but anything more complex can try to access this temp storage in non-straightforward order.

In that case other types of storage can be used, e.g. std::map will fix that problem and will still allow infinite src and lazy evaluation at the expense of some memory:
```
const std::vector<T>& fc(const T& t)
{
    static std::map<T, vector<T>> smap;
    smap[t] = f(t);
    return smap[t];
}
```
If your f function is stateless, this std::map can also be used to potentially save some calls. This approach can possibly be improved further if there is a way to guarantee that an element will no longer be required and remove it from the std::map to conserve memory. This however depends on further steps of the pipeline and the evaluation.

As these 3 solutions pretty much cover all the places to introduce temporary storage between view::transform and view::join, I think these are all the options you have. I would suggest going with #3 as it will allow you to keep the overall semantics intact and it is quite simple to implement.

"Create a version of view::all that can bind to an rvalue reference to the container. This would basically keep temporary objects returned by f from expiring during the lifetime of whatever the whole expression returns." This is wrong. The lifetime is of the temporary is only bound to the entire expression if it is the entire expression. I.e. `auto&& x = f();` extends the lifetime of the result of `f`, but `auto&& x = g(f());` doesn't, and it's up to `g` to ensure it lives as long as it's return value. — R. Martinho Fernandes, Sep 19 '16 at 17:01
@R.MartinhoFernandes Right, what I meant was to actually store the value of whatever `f()` returns inside this "rvalue view", as the other answers suggested. I guess "bind" is a wrong word for this. However I don't think it's possible to get away with temporaries as the `view`s stacked down the line are allowed to (and `view::join` will) try to access these elements multiple times. — Ap31, Sep 19 '16 at 20:29
Actually I think this whole problem stems from the fact that `view::join` violates this functional-ish behavior as it is by storing lvalue refs - proper functional implementation would just call the underlying view every time it needs something. In this case that would lead to extra `f()` calls which is perfectly fine from the functional pov, but in reality of course functions consume CPU and sometimes have state. But yeah, this "fixed" `view::join` would be rvalue-friendly and that would be a -proper- solution I guess — Ap31, Sep 19 '16 at 21:16

Eric Niebler · Answer 6 · 2020-06-10T17:25:55.717

UPDATE

range-v3 now has views::cache1, a view that caches the most recent element in the view object itself, and returns a reference to that object. That is how this problem is cleanly and efficiently solved today, as pointed out by user @bradgonesurfing in his answer.

Old, out-of-date answer below, preserved for historical curiosity.

This is another solution that doesn't require much fancy hacking. It comes at the cost of a call to std::make_shared at each call to f. But you're allocating and populating a container in f anyway, so maybe this is an acceptable cost.

#include <range/v3/core.hpp>
#include <range/v3/view/iota.hpp>
#include <range/v3/view/transform.hpp>
#include <range/v3/view/join.hpp>
#include <vector>
#include <iostream>
#include <memory>

std::vector<int> f(int i) {
    return std::vector<int>(3u, i);
}

template <class Container>
struct shared_view : ranges::view_interface<shared_view<Container>> {
private:
    std::shared_ptr<Container const> ptr_;
public:
    shared_view() = default;
    explicit shared_view(Container &&c)
    : ptr_(std::make_shared<Container const>(std::move(c)))
    {}
    ranges::range_iterator_t<Container const> begin() const {
        return ranges::begin(*ptr_);
    }
    ranges::range_iterator_t<Container const> end() const {
        return ranges::end(*ptr_);
    }
};

struct make_shared_view_fn {
    template <class Container,
        CONCEPT_REQUIRES_(ranges::BoundedRange<Container>())>
    shared_view<std::decay_t<Container>> operator()(Container &&c) const {
        return shared_view<std::decay_t<Container>>{std::forward<Container>(c)};
    }
};

constexpr make_shared_view_fn make_shared_view{};

int main() {
    using namespace ranges;
    auto rng = view::ints | view::transform(compose(make_shared_view, f)) | view::join;
    RANGES_FOR( int i, rng ) {
        std::cout << i << '\n';
    }
}

What do you think about removing `shared_view`'s constructor from `Container const&` to prevent future changes silently turning the O(1) construction into an O(N) construction? — Casey, Jan 25 '17 at 20:08
Done. But there currently isn't any guarantee that moving the container is O(1). — Eric Niebler, Jan 26 '17 at 23:44
what about this? `auto rng = src | views::transform(f) | views::cache1 | views::join;` — horseyguy, Apr 11 '20 at 22:24
Yes, that's how you would do it today. `view::cache1` didn't exist when this question was first asked. — Eric Niebler, Apr 14 '20 at 16:03
This is a great answer! But I didn't see why you'd want to do this, until I read the other answers and then reached the same conclusion myself. If I understand correctly, the path that leads here is: (1) start with ptrj's answer, which nicely solves the problem of extending the life of the temporary to where it's needed, but violates the view contract. (2) Eric's comment on that answer explains what's violated: such a view would not have an O(1) copy method. (3) realize that it *would* have an O(1) copy method, if all copies of it shared the data via a shared_ptr. And that's this answer. — Don Hatch, Apr 16 '20 at 11:04
On second thought, now I'm not so sure. You're using shared_ptrs in a different way from the way I thought in my (3). — Don Hatch, Apr 16 '20 at 11:43

How do I write a range pipeline that uses temporary containers?

6 Answers6

Linked

Related