10

g++ does implement __restrict__ for pointers, but I could not find anything about iterators. My overall intent is to encourage the compiler to vectorize stl loops.

Edit:

Even if the compiler is unable to vectorize, the __restrict__ keyword should be able to tell the compiler that no unnecessary reloads are necessary inside a loop.

Community
  • 1
  • 1
srean
  • 2,578
  • 18
  • 18
  • `__restrict__` is officially not specified in the C++ standard AFAIK, although many compilers support it. Here is a paper on this issue: [N3635](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3635.pdf) – laszlo.endre Dec 19 '14 at 19:30

2 Answers2

6

I don't know the answer to your direct question. However, the compiler would only ever be able to vectorize a loop for std::vector, as it's the only container (I think) that has contiguous storage, and no dependencies between successive storage locations (unlike e.g. std::list). I don't know how to make it do so, though.

Update

After some experimentation (which may or may not be relevant to the overall goal), I discovered that in ICC, the following does not vectorise:

typedef std::vector<float> V;

V vec(4096);

for (V::iterator it = vec.begin(); it != vec.end(); ++it)
{
    *it *= *it;
}

whereas the following does:

V vec(4096);

V::iterator it2 = vec.end();
for (V::iterator it = vec.begin(); it != it2; ++it)
{
    *it *= *it;
}

So apparently, the problem is not so much iterators, but the call to vec.end() inside the loop construct, which apparently cannot be factored out, even though it's clear that the loop body doesn't affect the vector bounds.

In GCC, I couldn't get anything to vectorise. This isn't surprising, because GCC is much worse than ICC at spotting SSE opportunities.

Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
  • Yes definitely. Given that iterators are an abstraction of pointers I would think restrict applies to pointers as much as they do to iterators (into vectors). But I am not sure. – srean Feb 13 '11 at 11:03
  • 1
    Vectorization of random access iterators on contiguous storage would already be a lot of help, actually. – Konrad Rudolph Feb 13 '11 at 11:05
  • 1
    @Konrad: It seems that even given a trivial `for (std::vector::iterator ...)` loop, even ICC (which is normally much better than GCC at vectorisation) isn't able to vectorise. – Oliver Charlesworth Feb 13 '11 at 11:08
  • @Konrad,@Oli I guess the v[1] object may refer to v[2] as a data member. So unless there is a mechanism to tell the compiler that no such aliasing takes place, the compiler wouldn't be able to vectorize, except may be if the functor that for takes is a constant functor. – srean Feb 13 '11 at 11:16
  • 1
    @srean: See my comment reply to @Konrad. Even with a `vector` (i.e. a vector of primitives where dependencies are impossible), the compiler was unable to vectorise ("`loop was not vectorized: unsupported loop structure.`"). – Oliver Charlesworth Feb 13 '11 at 11:19
  • @Oli Oh! I did not notice they were vector of floats. That is indeed strange. – srean Feb 13 '11 at 11:23
  • @Oli: that’s my point though. It *would* be nice if the compiler did this and in principle it should be possible if we can make the compiler realize that the iterators here resolve to a simple pointer (or index or however you implement them) and are never aliased. – Konrad Rudolph Feb 13 '11 at 11:26
  • @Konrad: It does seem to realise this, as the generated assembler for an iterator-based loop (for a vector) is essentially identical to that for a (non-vectorised) C-style loop. I guess the problem is that the vectorisation pass of the compiler has been heuristically tuned to only work with a small subset of possible loop constructs, as the "rules" for the general case are probably horrifically complex! – Oliver Charlesworth Feb 13 '11 at 11:29
  • @Oli: “It does seem to realise this” – *exactly*. I believe vectorization fails for the sole reason that srean has pointed out, i.e. aliasing. – Konrad Rudolph Feb 13 '11 at 11:31
  • @Konrad: At least in the case of ICC, I don't think that aliasing is the reason (well, at least not the *only* reason). The vectorisation report simply states "unsupported loop structure"; the vectorisation optimiser simply hasn't been written to recognise anything other than very simple loop constructs. – Oliver Charlesworth Feb 13 '11 at 11:38
  • @Oli: hm. But once the code is optimized it *is* a simple loop structure. Of course, it’s possible that the vectorization optimization is applied before the iterator is optimized away. But I would assume that vectorization is an optimization that is applied very late, and inlining the iterator class should be applied very early. – Konrad Rudolph Feb 13 '11 at 11:54
  • 3
    @Konrad: After some more experimentation, I've found that this loop construct will not vectorise: `for (vec_t::iterator it = vec.begin(); it != vec.end(); ++it)`, whereas this one will: `vec_t::iterator itEnd = vec.end(); for (vec_t::iterator it = vec.begin(); it != itEnd; ++it)`. So the problem seem to be not so much iterators, but simply the call to `vector::end()` that for whatever reason it cannot optimise away. – Oliver Charlesworth Feb 13 '11 at 12:10
  • 1
    @Oli: interesting. Thanks for making the effort. Now put it in the answer so we can vote this up. – Konrad Rudolph Feb 13 '11 at 12:12
  • @Oli: I always read as a guideline that recomputing `end()` on each loop iteration was **BAD** (tm) but I thought (naively) that compilers would be able to recognize it as a loop invariant and hoist it out. Apparently it is not so... – Matthieu M. Feb 13 '11 at 12:50
  • @Oli that makes sense, because the vec.end() could change because of addition or deletion in the loop. But STL algorithm, like foreach etc should vectorize because they have no way of changing the number of elements in the container. I totally second @Konrad's suggestion of editing the answer. I am curious of something else, does it do any loop-unrolling when it does not vectorize. Unrolling and avoiding unnecessary reloads is still better than nothing. Thanks for all your experiments. – srean Feb 13 '11 at 12:55
  • @srean: Answer updated! Looking at the resulting assembler, I didn't observe any loop unrolling, unfortunately. Having thought about the overall problem some more, I can now see that your question about `restrict` is a pertinent one, and unfortunately I don't believe there is an equivalent for iterators. – Oliver Charlesworth Feb 13 '11 at 13:17
  • 3
    When you say that GCC sucks in vectorizing loops, did you even try with the option enabled `gcc -O2 -ftree-vectorize -msse[2]`? – rubenvb Feb 13 '11 at 13:30
  • @rubenvb: That's a good point; I'd forgotten that GCC doesn't do this by default! You're right, with this flag enabled, I get vectorised assembler, albeit considerably more bloated than ICC's attempt. – Oliver Charlesworth Feb 13 '11 at 13:38
  • @rubenvb: Nevertheless, I stand by my assertion that GCC takes fewer SSE opportunities than ICC does... – Oliver Charlesworth Feb 13 '11 at 13:53
  • @OliverCharlesworth, presumable this means to that C++11's range-for loop (for `std::vector`) will vectorize, because that is equivalent to store the `end()` iterator. – alfC Aug 09 '17 at 07:12
-2

Take this C++20-solution:

#pragma once
#include <cstddef>
#include <memory>
#include <iterator>
#include <type_traits>
#include <concepts>

template<typename Iterator>
concept restrict_it_concept = requires() { { *Iterator() }; } || requires() { { (Iterator())[(std::ptrdiff_t)1] }; };

template<restrict_it_concept Iterator>
struct restrict_it final
{
    using value_type = std::remove_reference_t<std::conditional_t<requires() { { *Iterator() }; }, decltype(*Iterator()), decltype((Iterator())[(ptrdiff_t)1])>>;
    constexpr restrict_it() noexcept
        requires requires() { { Iterator() }; };
    constexpr restrict_it( Iterator it ) noexcept
        requires requires( Iterator it ) { { Iterator( it ) }; };
    constexpr restrict_it( restrict_it const &other ) noexcept
        requires requires( Iterator it ) { { Iterator( it ) }; };
    constexpr restrict_it &operator =( Iterator it ) noexcept
        requires requires( Iterator it ) { { it = it }; };
    constexpr restrict_it &operator =( restrict_it const &other ) noexcept
        requires requires( Iterator it ) { { it = it }; };
    constexpr bool operator <( restrict_it const &other ) const noexcept
        requires requires( Iterator it ) { { it < it } -> std::convertible_to<bool>; };
    constexpr bool operator <=( restrict_it const &other ) const noexcept
        requires requires( Iterator it ) { { it <= it } -> std::convertible_to<bool>; };
    constexpr bool operator >( restrict_it const &other ) const noexcept
        requires requires( Iterator it ) { { it > it } -> std::convertible_to<bool>; };
    constexpr bool operator >=( restrict_it const &other ) const noexcept
        requires requires( Iterator it ) { { it >= it } -> std::convertible_to<bool>; };
    constexpr bool operator ==( restrict_it const &other ) const noexcept
        requires requires( Iterator it ) { { it == it } -> std::convertible_to<bool>; };
    constexpr bool operator !=( restrict_it const &other ) const noexcept
        requires requires( Iterator it ) { { it != it } -> std::convertible_to<bool>; };
    constexpr std::ptrdiff_t operator -( restrict_it const &other ) const noexcept
        requires requires( Iterator it ) { { it - it } -> std::convertible_to<std::ptrdiff_t>; };
    constexpr restrict_it &operator ++() noexcept
        requires requires( Iterator it ) { { ++it }; };
    constexpr restrict_it &operator --() noexcept
        requires requires( Iterator it ) { { --it }; };
    constexpr restrict_it operator ++( int ) const noexcept
        requires requires( Iterator it ) { { it++ } -> std::convertible_to<Iterator>; };
    constexpr restrict_it operator --( int ) const noexcept
        requires requires( Iterator it ) { { it-- } -> std::convertible_to<Iterator>; };
    constexpr restrict_it operator +( std::ptrdiff_t offset ) const noexcept
        requires requires( Iterator it, std::ptrdiff_t offset ) { { it + offset } -> std::convertible_to<Iterator>; };
    friend constexpr restrict_it operator +( std::ptrdiff_t offset, restrict_it it ) noexcept
        requires requires( std::ptrdiff_t offset, Iterator it ) { { offset + it } -> std::convertible_to<Iterator>; }
    {
        return restrict_it( offset + it.m_it );
    }
    constexpr restrict_it operator -( std::ptrdiff_t offset ) const noexcept
        requires requires( Iterator it, std::ptrdiff_t offset ) { { it - offset } -> std::convertible_to<Iterator>; };
    constexpr value_type &__restrict operator *() const noexcept
        requires requires( Iterator it ) { { *it } -> std::convertible_to<value_type &>; };
    constexpr value_type &__restrict operator []( std::ptrdiff_t index ) const noexcept
        requires requires( Iterator it, std::ptrdiff_t index ) { { it[index] } -> std::convertible_to<value_type &>; };
    constexpr value_type *__restrict operator ->() const noexcept
        requires requires( Iterator it ) { { it.operator ->() } -> std::convertible_to<value_type *>; };
    constexpr restrict_it &operator +=( std::ptrdiff_t offset ) const noexcept
        requires requires( Iterator it, std::ptrdiff_t offset ) { { it += offset }; };
    constexpr restrict_it &operator -=( std::ptrdiff_t offset ) const noexcept
        requires requires( Iterator it, std::ptrdiff_t offset ) { { it -= offset }; };
private:
    Iterator m_it;
};

template<restrict_it_concept Iterator>
constexpr restrict_it<Iterator>::restrict_it() noexcept
    requires requires() { { Iterator() }; }
    : m_it()
{
}

template<restrict_it_concept Iterator>
constexpr restrict_it<Iterator>::restrict_it( Iterator it ) noexcept
    requires requires( Iterator it ) { { Iterator( it ) }; }
    : m_it( it )
{
}

template<restrict_it_concept Iterator>
constexpr restrict_it<Iterator>::restrict_it( restrict_it const &other ) noexcept
    requires requires( Iterator it ) { { Iterator( it ) }; }
    : m_it( other.m_it )
{
}

template<restrict_it_concept Iterator>
constexpr restrict_it<Iterator> &restrict_it<Iterator>::operator =( Iterator it ) noexcept
    requires requires( Iterator it ) { { it = it }; }
{
    m_it = it;
    return *this;
}

template<restrict_it_concept Iterator>
constexpr restrict_it<Iterator> &restrict_it<Iterator>::operator =( restrict_it const &other ) noexcept
    requires requires( Iterator it ) { { it = it }; }
{
    m_it = other.m_it;
    return *this;
}

template<restrict_it_concept Iterator>
constexpr bool restrict_it<Iterator>::operator <( restrict_it const &other ) const noexcept
    requires requires( Iterator it ) { { it < it } -> std::convertible_to<bool>; }
{
    return m_it < other.m_it;
}

template<restrict_it_concept Iterator>
constexpr bool restrict_it<Iterator>::operator <=( restrict_it const &other ) const noexcept
    requires requires( Iterator it ) { { it <= it } -> std::convertible_to<bool>; }
{
    return m_it <= other.m_it;
}

template<restrict_it_concept Iterator>
constexpr bool restrict_it<Iterator>::operator >( restrict_it const &other ) const noexcept
    requires requires( Iterator it ) { { it > it } -> std::convertible_to<bool>; }
{
    return m_it > other.m_it;
}

template<restrict_it_concept Iterator>
constexpr bool restrict_it<Iterator>::operator >=( restrict_it const &other ) const noexcept
    requires requires( Iterator it ) { { it >= it } -> std::convertible_to<bool>; }
{
    return m_it >= other.m_it;
}

template<restrict_it_concept Iterator>
constexpr bool restrict_it<Iterator>::operator ==( restrict_it const &other ) const noexcept
    requires requires( Iterator it ) { { it == it } -> std::convertible_to<bool>; }
{
    return m_it == other.m_it;
}

template<restrict_it_concept Iterator>
constexpr bool restrict_it<Iterator>::operator !=( restrict_it const &other ) const noexcept
    requires requires( Iterator it ) { { it != it } -> std::convertible_to<bool>; }
{
    return m_it != other.m_it;
}

template<restrict_it_concept Iterator>
constexpr std::ptrdiff_t restrict_it<Iterator>::operator -( restrict_it const &other ) const noexcept
    requires requires( Iterator it ) { { it - it } -> std::convertible_to<std::ptrdiff_t>; }
{
    return m_it - other.m_it;
}

template<restrict_it_concept Iterator>
constexpr restrict_it<Iterator> &restrict_it<Iterator>::operator ++() noexcept
    requires requires( Iterator it ) { { ++it }; }
{
    ++m_it;
    return *this;
}

template<restrict_it_concept Iterator>
constexpr restrict_it<Iterator> &restrict_it<Iterator>::operator --() noexcept
    requires requires( Iterator it ) { { --it }; }
{
    --m_it;
    return *this;
}

template<restrict_it_concept Iterator>
constexpr restrict_it<Iterator> restrict_it<Iterator>::operator ++( int ) const noexcept
    requires requires( Iterator it ) { { it++ } -> std::convertible_to<Iterator>; }
{
    return restrict_it( m_it++ );
}

template<restrict_it_concept Iterator>
constexpr restrict_it<Iterator> restrict_it<Iterator>::operator --( int ) const noexcept
    requires requires( Iterator it ) { { it-- } -> std::convertible_to<Iterator>; }
{
    return restrict_it( m_it-- );
}

template<restrict_it_concept Iterator>
constexpr restrict_it<Iterator> restrict_it<Iterator>::operator +( std::ptrdiff_t offset ) const noexcept
    requires requires( Iterator it, std::ptrdiff_t offset ) { { it + offset } -> std::convertible_to<Iterator>; }
{
    return restrict_it( m_it + offset );
}

template<restrict_it_concept Iterator>
constexpr restrict_it<Iterator> restrict_it<Iterator>::operator -( std::ptrdiff_t offset ) const noexcept
    requires requires( Iterator it, std::ptrdiff_t offset ) { { it - offset } -> std::convertible_to<Iterator>; }
{
    return restrict_it( m_it - offset );
}

template<restrict_it_concept Iterator>
constexpr typename restrict_it<Iterator>::value_type &__restrict restrict_it<Iterator>::operator *() const noexcept
    requires requires( Iterator it ) { { *it } -> std::convertible_to<value_type &>; }
{
    return *m_it;
}

template<restrict_it_concept Iterator>
constexpr typename restrict_it<Iterator>::value_type &__restrict restrict_it<Iterator>::operator []( std::ptrdiff_t index ) const noexcept
    requires requires( Iterator it, std::ptrdiff_t index ) { { it[index] } -> std::convertible_to<value_type &>; }
{
    return m_it[index];
}

template<restrict_it_concept Iterator>
constexpr typename restrict_it<Iterator>::value_type *__restrict restrict_it<Iterator>::operator ->() const noexcept
    requires requires( Iterator it ) { { it.operator ->() } -> std::convertible_to<value_type *>; }
{
    return m_it.operator ->();
}

template<restrict_it_concept Iterator>
constexpr restrict_it<Iterator> &restrict_it<Iterator>::operator +=( std::ptrdiff_t offset ) const noexcept
    requires requires( Iterator it, std::ptrdiff_t offset ) { { it += offset }; }
{
    m_it += offset;
    return *this;
}

template<restrict_it_concept Iterator>
constexpr restrict_it<Iterator> &restrict_it<Iterator>::operator -=( std::ptrdiff_t offset ) const noexcept
    requires requires( Iterator it, std::ptrdiff_t offset ) { { it -= offset }; }
{
    m_it -= offset;
    return *this;
}

#if !defined(NDEBUG)
#include <vector>
template<>
struct restrict_it<typename std::vector<char>::const_iterator>;
#endif

With that you could simply write:

restrict_it it( vec.cbegin() );

Works with any container.

Bonita Montero
  • 2,817
  • 9
  • 22
  • Perhaps you could explain what this actually does. – n. m. could be an AI Jul 02 '22 at 09:58
  • It helps you to have __restrict-semantics with any type of iterator. – Bonita Montero Jul 03 '22 at 10:02
  • Hmm I see the code uses \_\_restrict on function return types. Does it really work? "Restrict semantics apply to lvalue expressions only; for example, a cast to restrict-qualified pointer or a function call returning a restrict-qualified pointer are not lvalues and the qualifier has no effect." (cppteference) – n. m. could be an AI Jul 03 '22 at 10:28
  • The above code doesn't compile with clang++ 13 because the concepts implementation has mistakes. MSVC doesn't honor the __restrict-qualfiers correctly, but g++ 11.1.0 does. – Bonita Montero Jul 03 '22 at 16:44
  • There is no such thing as "the" correct handling of `__restrict`. It is a non-standard extension, therefore anything implementation does is correct by (its own) definition. – n. m. could be an AI Jul 03 '22 at 16:47
  • Sorry wanted to ask a question but got distracted. The question is, does this code result in optimisations being done (by any compiler) that are normally not done? – n. m. could be an AI Jul 03 '22 at 16:57
  • clang 13 does Optimizations according to &__restrict references, g++ also, but clang 13 is too stupid to compile the above code. MSVC compiles the above code but doesn't honor the __restrict with that. – Bonita Montero Jul 05 '22 at 16:17