Using ranges::view::iota in parallel algorithms

Question

Since there is no index based parallel for algorithm in c++17, I'm wondering if ranges::view::iota can be used in combination with std::for_each to emulate that. That is:

using namespace std;

constexpr int N= 10'000'000;
ranges::iota_view indices(0,N);
vector<int> v(N);

for_each(execution::par_unseq,indices.begin(),indices.end(),[&](int i) { v[i]= i; });

iota_view seems to provide random access for appropriate types ([range.iota.iterator]):

iota_view<I, Bound>::iterator::iterator_category is defined as follows:

(1.1) — If I models Advanceable, then iterator_category is random_access_iterator_tag.

(1.2) — Otherwise, if I models Decrementable, then iterator_category is bidirectional_iterator_tag.

(1.3) — Otherwise, if I models Incrementable, then iterator_category is forward_iterator_tag.

(1.4) — Otherwise, iterator_category is input_iterator_tag.

Is the above code correct? Is there any performance penalty in using iota_view this way?

EDIT: I've made some tests with range-v3, cmcstl2, and Intel's PSTL.

Using range-v3, the above example fails to compile with GCC 8. The compiler complains about begin and end having different types:

deduced conflicting types for parameter ‘_ForwardIterator’ (‘ranges::v3::basic_iterator<ranges::v3::iota_view<int, int> >’ and ‘ranges::v3::default_sentinel’)

Using cmcstl2 the code compiles cleanly, but it doesn't run in parallel. It seems to me that it falls back to the sequential version, maybe because the forward iterators requirements are somehow not met (https://godbolt.org/z/yvr-M2).

There is a somewhat related PSTL issue (https://github.com/intel/parallelstl/issues/22).

@Caleth I want to emulate a sort of `tbb::parallel_for` not just the particular example I posted. I hope being able to write `ranges::for_each(execution::par_unseq,view::iota(0,N),[&](int i) { /* ... */})`. — metalfox, Jul 05 '18 at 09:04
OK, let's rephrase: why not use the appropriate algorithm, rather than always rewrite it in terms of for_each? — Caleth, Jul 05 '18 at 09:09
@Caleth Sometimes it is difficult to reformulate a `for` loop in terms of a stl-algorithm. For example, when there are several vectors involved, etc. — metalfox, Jul 05 '18 at 09:17
You can `ranges::combine` multiple vectors together that use one index — Caleth, Jul 05 '18 at 09:20
Also, if you find particular difficulty in finding an appropriate algorithm, it may be that what you are doing can't be arbitrarily parallelised — Caleth, Jul 05 '18 at 09:21
@Caleth I'm only saying that sometimes it is useful to have an index-based parallel for loop. For example, I have a motivating use case consisting of calculating a parallel incomplete LU factorization of a matrix free linear system. It is much more clear to me to use indices in this case to figure out what I'm doing. I'm not proposing to always replace dedicated algorithms by for loops. — metalfox, Jul 06 '18 at 06:47
It's not a ranges view, but I was able to create an iota like iterator which works for parallel execution (par, not par_unseq). The code is available in https://stackoverflow.com/questions/57638255/is-there-a-way-of-making-a-lock-free-counter-random-access-iterator-in-c — Mr.WorshipMe, Sep 10 '19 at 19:25
@Caleth Unfortunately, `ranges::views::zip` suffers from the same problem as `ranges::view::iota`. It doesn't model `Cpp17ForwardIterator` and therefore its usage in parallel algorithms is not compilant. — Pilar Latiesa, Feb 14 '20 at 14:25
The `std::ranges::common_view` provides the `std::ranges::begin` and `std::ranges::end` overloads which can be then used in STL algortihms, even with the `std::execution::par` modifier. Please see my answer below for the sample. — Viktor Latypov, Oct 27 '20 at 11:23

score 6 · Answer 1 · answered Dec 10 '18 at 09:13

After digging in the standard draft, I'm afraid that the answer is no: it is not strictly standard compliant to use ranges::iota_view in the parallel version of for_each.

The parallel overload of for_each is declared as [alg.foreach]:

template<class ExecutionPolicy, class ForwardIterator, class Function>
  void for_each(ExecutionPolicy&& exec,
                ForwardIterator first, ForwardIterator last,
                Function f);

On the other hand, in [algorithms.requirements] we find the constraint:

If an algorithm's template parameter is named ForwardIterator, ForwardIterator1, or ForwardIterator2, the template argument shall satisfy the Cpp17ForwardIterator requirements.

As noted by Billy O'Neal in one of the links I posted in the question, a sensible implementation of ranges::iota_view::iterator is very unlikely to meet the "equal iterators reference the same object" forward iterator requirement [iterator.cpp17]. Therefore, in my understanding, ranges::iota_view::iterator won't satisfy the Cpp17ForwardIterator requirements, and the same goes for e.g. boost::counting_iterator.

However, in practice I would expect that implementations will use std::iterator_traits::iterator_category to dispatch the appropriate overload of the algorithm, as PSTL seems to do. Therefore, I believe that the example code in the OP would work as intended. The reason that cmcstl2 doesn't work is probably that the used iterator_category belong to the __stl2 namespace instead of being the std ones.

`std::ranges::iota_view::iterator` is not even an input iterator: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93651. — Pilar Latiesa, Feb 14 '20 at 14:18

score 2 · Answer 2 · answered Oct 27 '20 at 11:20

In C++20 there is a std::views::common which adapts a range to the standard pair-of-iterators-accepting algorithms. After converting the input range to std::ranges::common_range use the std::ranges::begin and std::ranges::end functions to get a pair of iterators for std::transform or whatever algorithm you use.

Here is a sample program which assumes a C++20 compiler (this is not a ranges-v3-based implementation). The only one I have tested (as of October 2020) is G++ version 10.

#include <algorithm>
#include <numeric>
#include <execution>
#include <iostream>
#include <vector>
#include <ranges>

int main()
{
    // A "large" number of elements (limited to ten for a reasonably small std::cout output)
    constexpr int N = 10;

    // Some range with a finite number of values (views::take at the end)
    auto very_long_input_range = std::views::iota(0) | std::views::take(N);

    // Source range converted to common_range (which supports std::begin & std::end)
    auto input_range = std::ranges::common_view(very_long_input_range);

    // Element processing function. E.g., if 'i' is a file name and this lambda parses it, it might be a big time-saver
    auto some_complex_function = [](auto i){ return i * i; };

    // Declare and allocate an output array (maybe range_value_t is an overkill here, but still)
    // Using std::ranges::size(input_range) instead of N can also help generalize this code,
    // but input_range must satisfy the std::ranges::sized_range concept
    std::vector< std::ranges::range_value_t<decltype(input_range)> > output_array( N );

    // Use C++17 std::execution::par with a pair of C++20 iterators from std::ranges
    std::transform(std::execution::par,
            std::ranges::begin(input_range),
            std::ranges::end(input_range),
            output_array.begin(),
            some_complex_function);

    // Test the output
    for (auto p: output_array)
            std::cout << p << std::endl;
}

The command line for G++10 (Ubuntu 20.20) is

g++-10 -std=c++2a -o ptest ptest.cpp -ltbb -lstdc++

The problem is not only that `begin` and `end` may have different types, but that those range adaptors' iterators are not required to model Cpp17ForwardIterator (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93651). For example, this doesn't run in parallel: https://godbolt.org/z/nPfexe, whereas this does run in parallel: https://godbolt.org/z/1oEo3c. — metalfox, Oct 27 '20 at 20:01
@metalfox: I agree, there are many obvious problems with the current Ranges library, starting from the fact that most samples are given for Ranges-v3 and not the standard library, and the standard library is only implemented in G++10, where there are less view variants then in Ranges-v3. This answer is inspired by my limited exposure to ranges and an excitement that at least _some_ variant of the code can run in parallel. The best use case for me right now is to generate a 2D combination of iotas (temporary grid) and run some ray tracer for that range - this should run parallel. — Viktor Latypov, Oct 28 '20 at 10:09
I love the ranges library. I'd say that the problem is in the old components that weren't accomodated to play well with the ranges. From what I've tested so far, `iota_view` iterators are not recognized as random access by libstd++ on x86_64 because its difference type is 128 bit wide. `iota_view` works apparently fine with the parallel algorithms. — metalfox, Oct 28 '20 at 19:34
Let's just sum this discussion up by declaring "Ranges are in the C++20 standard in an incomplete state, but this is just to ensure that in future versions of the standard people would think how to make old containers/algorithms/exec_policies compatible with ranges and not about including the ranges" ) — Viktor Latypov, Oct 28 '20 at 21:33

score 0 · Answer 3 · answered Feb 14 '22 at 19:34

I came across this myself; I believe in essence the underlying issue is that both std::ranges and range-v3 now require the concept of snetinels to work correctly - see here for a post on this. I have added as a separate answer, as this concept is the theoretical reason why the code can't compile. Thanks to the other answers for pointing towards the right workaround of then converting these to a common range.

As you mention, the old std algorithms have iterators of the same type, such that it is possible to perform a comparison i != end in order to control the loop structure. The overloads for various standard-library functions rely on the two iterators' having the same deduced type. This is now referred to as a common range, because the iterators share a common type. This worked, however, for ranges, the concept of our data is a bit more generic, and the end may not be known until it is reached. Thus, like before, the 'end' marker signifies that no more data is available to be consumed, but this behaviour is now codified into a separate type (a sentinel), rather than relying on the comparison, which cannot be performed for e.g. an infinite sequence. Range algorithms thus now need to overload on two separate template parameters - the iterator and the sentinel, in order to behave correctly. Without an adaptor, the range can't be used in traditional iterator algorithms, but this is exactly what views::common is there to do for us.

Iterators and sentinels not sharing the same type is not the only problem. See: https://godbolt.org/z/nPfexe and http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2408r4.html Hopefully we'll have this fixed for C++23. — metalfox, Mar 01 '22 at 12:43

Using ranges::view::iota in parallel algorithms

3 Answers3

Linked