What could be a "least bad implementation" for an iterator over a proxied container?

Question

Context

I was trying to implement a nD array like container. Something that would wrap an underlying sequence container and allow to process it as a container of containers (of...): arr[i][j][k] should be a (eventually const) reference for _arr[(((i * dim2) + j) * dim3) + k].

Ok until there, arr[i] has just to be a wrapper class over the subarray...

And when I tried to implement interators, I suddenly realized that dragons were everywhere around:

my container is not a standard compliant container because operator [] returns a proxy or wrapper instead of a true reference (When Is a Container Not a Container?)
this causes the iterator to be either a stashing iterator (which is known to be bad (Reference invalidation after applying reverse_iterator on a custom made iterator and its accepted answer)
... or a proxy iterator which is not necessarily better (To Be or Not to Be (an Iterator))

The real problem is that as soon as you have a proxied container, no iterator can respect the following requirement for a forward iterator:

Forward iterators [forward.iterators]
...
⁶ If a and b are both dereferenceable, then a == b if and only if *a and *b are bound to the same object.

Examples come from the standard library itself:

vector<bool> is known not to respect all the requirements for containers because it returns proxies instead of references:

Class vector [vector.bool]
...
³ There is no requirement that the data be stored as a contiguous allocation of bool values. A space-optimized representation of bits is recommended instead.
⁴ reference is a class that simulates the behavior of references of a single bit in vector.
filesystem path iterator is known to be a stashing iterator:

path iterators [fs.path.itr]
...
² A path::iterator is a constant iterator satisfying all the requirements of a bidirectional iterator (27.2.6) except that, for dereferenceable iterators a and b of type path::iterator with a == b, there is no requirement that*a and *b are bound to the same object.

and from cppreference:

Notes: std::reverse_iterator does not work with iterators that return a reference to a member object (so-called "stashing iterators"). An example of stashing iterator is std::filesystem::path::iterator.

Question

I have currently found plenty of references about why proxied containers are not true containers and why it would be nice if proxied containers and iterators were allowed by the standard. But I have still not understood what was the best that could be done and what were the real limitations.

So my question is why proxy iterators are really better that stashing ones, and what algorithms are allowed for either of them. If possible, I would really love to find a reference implementation for such an iterator

For references, a current implementation of my code has been submitted on Code Review. It contains a stashing iterator (that broke immediately when I try to use std::reverse_iterator)

I know it is a *give me ze code* question, but there is at least some research and I think that such an example implementation would be nice on SO. — Serge Ballesta, Jun 26 '18 at 15:51
Please ignore the forward iterator requirement of returning a reference, and read "bound to the same object" in a very informal way. — Marc Glisse, Jun 26 '18 at 16:00
A "least bad implementation" towards what purpose? As with any concepts, what matters is the expectation of the code consuming it. There's nothing wrong with a proxy iterator as long as everyone who uses it *knows that it is a proxy iterator* and treats it accordingly. Same goes for a "stashing" iterator. The problem with these is when you want to pass them to algorithms (like in the standard library) that have expectations that proxy/stashing iterators cannot satisfy. — Nicol Bolas, Jun 26 '18 at 16:07
@MarcGlisse: "*Please ignore the forward iterator requirement of returning a reference, and read "bound to the same object" in a very informal way.*" You can't ignore that. It's part of the standard, and code written that accepts ForwardIterators is entitled to do `value_type_t &val = *it;`, where `value_type_t` gets the value type from the iterator. — Nicol Bolas, Jun 26 '18 at 16:08
@NicolBolas Too bad for you if your code does that. If you insist, you can use boost traversal categories... — Marc Glisse, Jun 26 '18 at 16:11
@MarcGlisse: "*Too bad for you if your code does that.*" I'm not talking about my code. I mean any code that follows the C++ standard library ForwardIterator concept. Which includes the entire C++ container/algorithm library. You *cannot* pass a proxy iterator to such things and be guaranteed that they will work. — Nicol Bolas, Jun 26 '18 at 16:12
@NicolBolas: May be I am *too broad* here. But as I am trying to build a general use utility class, I would prefere reverse the logic and say: this can be done with that tool but that cannot. Ideally I would like to be able to apply standard algorithms, but I suppose that many won't accept proxy or stashing iterators. The standard just requires forward, bidirectional or random access iterators. Most of the requirements are satisfied except for [forward.iterators] 6. So strictly according to the standard almost nothing can be done. But I would bet a coin that some could work... — Serge Ballesta, Jun 26 '18 at 16:16
... And I can hardly imagine that I am the first to knock my head on this question, and would like to know how experienced programmers have solved it. — Serge Ballesta, Jun 26 '18 at 16:18
@SergeBallesta: "*But I would bet a coin that some could work...*" What good is that? You don't know which will work. You don't know which will stop working tomorrow. You don't know if it will work on libc++ or libstdc++ or msvc's standard library. Remember: something as innocuous as `auto &val = *it;` breaks a proxy iterator. So what good is "working" with no guarantees? — Nicol Bolas, Jun 26 '18 at 16:18
@NicolBolas: This did not prevent the commitee to standardize `vector` and `filesystem::path::iterators`... Ok implementors use undocumented pieces and you cannot do everything with them. But I do not want more. — Serge Ballesta, Jun 26 '18 at 16:22
@NicolBolas That's theory. In practice (several implementations, but obviously not all of the std functions), I've never had any issue. And I expect that as QoI most implementations would be willing to tweak their code to avoid breaking this use. Anyway, we have digressed too much, and I can perfectly understand preferring to adhere strictly to the standard. Hopefully the iterator categories will be fixed some day... (I didn't check what ranges do about it) — Marc Glisse, Jun 26 '18 at 17:59

Nicol Bolas · Accepted Answer · 2023-01-19T05:57:50.627

OK, we have two similar but distinct concepts. So lets lay them out.

But first, I need to make a distinction between the named requirements of C++-pre-20, and the actual in-language concepts created for the Ranges TS and included in C++20. They're both called "concepts", but they're defined differently. As such, when I talk about concept-with-a-lowercase-c, I mean the pre-C++20 requirements. When I talk about Concept-with-a-captial-C, I mean the C++20 stuff.

Proxy Iterators

Proxy iterators are iterators where their reference is not a value_type&, but is instead some other type that behaves like a reference to value_type. In this case, *it returns a prvalue to this reference.

The InputIterator concept imposes no requirement on reference, other than that it is convertible to value_type. However, the ForwardIterator concept makes the explicit statement that "reference is a reference to T".

Therefore, a proxy iterator cannot fit the ForwardIterator concept. But it can still be an InputIterator. So you can safely pass a proxy iterator to any function that only requires InputIterators.

So, the problem with vector<bool>s iterators is not that they're proxy iterators. It's that they promise they fulfill the RandomAccessIterator concept (though the use of the appropriate tag), when they're really only InputIterators and OutputIterators.

The Ranges proposal (mostly) adopted into C++20 makes changes to the iterator Concepts which allow all iterators to be proxy iterators. So under Ranges, vector<bool>::iterator really fulfills the RandomAccessIterator Concept. Therefore, if you have code written against the Ranges concepts, then you can use proxy iterators of all kinds.

This is very useful for dealing with things like counting ranges. You can have reference and value_type be the same type, so you're just dealing with integers either way.

And of course, if you have control over the code consuming the iterator, you can make it do whatever you want, so long as you don't violate the concept your iterator is written against.

Stashing Iterators

Stashing iterators are iterators where reference is (directly or indirectly) a reference to an object stored in the iterator. Therefore, if you make a copy of an iterator, the copy will return a reference to a different object than the original, even though they refer to the same element. And when you increment the iterator, previous references are no longer valid.

Stashing iterators are usually implemented because computing the value you want to return is expensive. Maybe it would involve a memory allocation (such as path::iterator) or maybe it would involve a possibly-complex operation that should only be done once (such as regex_iterator). So you only want to do it when necessary.

One of the foundations of ForwardIterator as a concept (or Concept) is that a range of these iterators represents a range over values which exist independently of their iterators. This permits multipass operation, but it also makes doing other things useful. You can store references to items in the range, and then iterate elsewhere.

If you need an iterator to be a ForwardIterator or higher, you should never make it a stashing iterator. Of course, the C++ standard library is not always consistent with itself. But it usually calls out its inconsistencies.

path::iterator is a stashing iterator. The standard says that it is a BidirectionalIterator; however, it also gives this type an exception to the reference/pointer preservation rule. This means that you cannot pass path::iterator to any code that might rely on that preservation rule.

Now, this doesn't mean you can't pass it to anything. Any algorithm which requires only InputIterator will be able to take such an iterator, since such code cannot rely on that rule. And of course, any code which you write or which specifically states in its documentation that it doesn't rely on that rule can be used. But there's no guarantee that you can use reverse_iterator on it, even though it says that it is a BidirectionalIterator.

regex_iterators are even worse in this regard. They are said to be a ForwardIterators based on their tag, but the standard never says that they actually are ForwardIterators (unlike path::iterator). And the specification of them as having reference be an actual reference to a member object makes it impossible for them to be true ForwardIterators.

Note that I made no distinction between the pre-C++20 concept and the Ranges Concept. That's because the std::forward_iterator Concept still forbids stashing iterators. This is by design.

Usage

Now obviously, you can do whatever you want in your code. But code you don't control will be under the domain of its owners. They will be writing against the old concepts, the new Concepts, or some other c/Concept or requirement that they specify. So your iterators need to be able to be compatible with their needs.

The algorithms that the Ranges introduces uses the new Concepts, so you can always rely on them to work with proxy iterators. However, as I understand it, the Range Concepts are not back-ported into older algorithms.

Personally, I would suggest avoiding stashing iterator implementations entirely. By providing complete support for proxy iterators, most stashing iterators can be rewritten to return values rather than references to objects.

For example, if there were a path_view type, path::iterator could have returned that instead of a full-fledged path. That way, if you want to do the expensive copy operation, you can. Similarly, the regex_iterators could have returned copies of the match object. The new Concepts make it possible to work that way by supporting proxy iterators.

Now, stashing iterators handle caching in a useful way; iterators can cache their results so that repeated *it usage only does the expensive operation once. But remember the problem with stashing iterators: returning a reference to their contents. You don't need to do that just to get caching. You can cache the results in an optional<T> (which you invalidate when the iterator is in/decremented). So you can still return a value. It may involve an additional copy, but reference shouldn't be a complex type.

Of course, all of this means that auto &val = *it; isn't legal code anymore. However, auto &&val = *it; will always work. This is actually a big part of the Range TS version of iterators.

Very nice. Just a nitpick: Even if an algorithm only needs a, let's say ForwardIterator, if it gets handed a RandomAccessIterator it might (and really should) take shortcuts based on the additional capabilities. So being a true ForwardIterator but a fake RandomAccessIterator might make an algorithm only requiring a ForwardIterator to stumble anyway. Morale: Lying is dangerous, even if you aren't always caught. — Deduplicator, Jun 26 '18 at 19:26
Thank you for this nice and detailed answer. I had not immediately accepted it because I hoped an implementation example but it would be unfair to ask more. — Serge Ballesta, Jun 29 '18 at 07:18