18

This is a C++ / D cross-over question. The D programming language has ranges that -in contrast to C++ libraries such as Boost.Range- are not based on iterator pairs. The official C++ Ranges Study Group seems to have been bogged down in nailing a technical specification.

Question: does the current C++11 or the upcoming C++14 Standard have any obstacles that prevent adopting D ranges -as well as a suitably rangefied version of <algorithm>- wholesale?

I don't know D or its ranges well enough, but they seem lazy and composable as well as capable of providing a superset of the STL's algorithms. Given their claim to success for D, it would seem very nice to have as a library for C++. I wonder how essential D's unique features (e.g. string mixins, uniform function call syntax) were for implementing its ranges, and whether C++ could mimic that without too much effort (e.g. C++14 constexpr seems quite similar to D compile-time function evaluation)

Note: I am seeking technical answers, not opinions whether D ranges are the right design to have as a C++ library.

TemplateRex
  • 69,038
  • 19
  • 164
  • 304
  • What feature is lacking? Remember that an end iterator is just an object that when compared to another iterator, tells you if you are done. Lazy C++ iterator based ranges are not hard. For fancy situations, like lazy ranges, some extra boilerplate is needed: for simple situations, like a range over packed contiguous memory, I cannot imagine D style ranges can be as fast. – Yakk - Adam Nevraumont Aug 31 '13 at 16:35
  • 2
    Have you looked at boost transform iterators and the like? In particular, give a concrete problem that is cheap in D but expensive/impossible in C++. – Yakk - Adam Nevraumont Aug 31 '13 at 16:38
  • You believe that because why? I believe you are wrong, or using words in ways that I am unfamiliar with. In any event, concrete problem, please. – Yakk - Adam Nevraumont Aug 31 '13 at 16:44
  • 1
    @Yakk Cartesian product of `N` ranges, lazily, meaning that it returns a range of `N`-tuples and increment does the computation of the next element – TemplateRex Aug 31 '13 at 16:47
  • @TemplateRex that is pretty trivial to do with C++ iterators. Is that really something you think C++ iterators cannot do? There are a few free choices: does it end when any end, all end? Are the returned elements elements, or optionally missing? Any reasonable choice is easy there. The core is a simple n tuple of iterators stored in an iterator fascade that returns an n tuple of references. It will be as lazy as its components. Next concrete problem? – Yakk - Adam Nevraumont Sep 01 '13 at 05:14
  • 2
    http://ideone.com/ZRDp4r needs another 30 lines of [`boost::iterator_fascade`](http://www.boost.org/doc/libs/1_38_0/libs/iterator/doc/iterator_facade.html#a-basic-iterator-using-iterator-facade) and testing and polish to be production quality. But I doubt answering a concrete problem will satisfy! – Yakk - Adam Nevraumont Sep 01 '13 at 14:40
  • @Yakk nice code! I think that my gripe is that this style of code (as well as Boost.Iterator) tightly couples an algorithm (Cartesian product, transform, filter, zip etc.) with iteration. I'd like to see a generic lazy iterator that stores the algorithmic part inside a function object to be called by `operator++`. – TemplateRex Sep 01 '13 at 16:16
  • 1
    @TemplateRex: `function_input_iterator` stores a function (object) that is invoked on increment. Now you just need to go to town with Boost.Coroutine for the algorithm part, but that means you tie the algorithm to be a range itself. – Xeo Sep 01 '13 at 17:05
  • 2
    @Yakk After some more studying of your code, I have to admit that Boost range adaptors / custom iterators are quite expressive. The D style ranges without iterators still have an intuitive appeal, but you may be right that they are not necessary to get lazy / composable algorithms. In any case, that was outside the scope of my Q, but thanks for showing the cartesian product code! – TemplateRex Sep 01 '13 at 21:14

2 Answers2

9

I don't think there is any inherent technical limitation in C++ which would make it impossible to define a system of D-style ranges and corresponding algorithms in C++. The biggest language level problem would be that C++ range-based for-loops require that begin() and end() can be used on the ranges but assuming we would go to the length of defining a library using D-style ranges, extending range-based for-loops to deal with them seems a marginal change.

The main technical problem I have encountered when experimenting with algorithms on D-style ranges in C++ was that I couldn't make the algorithms as fast as my iterator (actually, cursor) based implementations. Of course, this could just be my algorithm implementations but I haven't seen anybody providing a reasonable set of D-style range based algorithms in C++ which I could profile against. Performance is important and the C++ standard library shall provide, at least, weakly efficient implementations of algorithms (a generic implementation of an algorithm is called weakly efficient if it is at least as fast when applied to a data structure as a custom implementation of the same algorithm using the same data structure using the same programming language). I wasn't able to create weakly efficient algorithms based on D-style ranges and my objective are actually strongly efficient algorithms (similar to weakly efficient but allowing any programming language and only assuming the same underlying hardware).

When experimenting with D-style range based algorithms I found the algorithms a lot harder to implement than iterator-based algorithms and found it necessary to deal with kludges to work around some of their limitations. Of course, not everything in the current way algorithms are specified in C++ is perfect either. A rough outline of how I want to change the algorithms and the abstractions they work with is on may STL 2.0 page. This page doesn't really deal much with ranges, however, as this is a related but somewhat different topic. I would rather envision iterator (well, really cursor) based ranges than D-style ranges but the question wasn't about that.

One technical problem all range abstractions in C++ do face is having to deal with temporary objects in a reasonable way. For example, consider this expression:

auto result = ranges::unique(ranges::sort(std::vector<int>{ read_integers() }));

In dependent of whether ranges::sort() or ranges::unique() are lazy or not, the representation of the temporary range needs to be dealt with. Merely providing a view of the source range isn't an option for either of these algorithms because the temporary object will go away at the end of the expression. One possibility could be to move the range if it comes in as r-value, requiring different result for both ranges::sort() and ranges::unique() to distinguish the cases of the actual argument being either a temporary object or an object kept alive independently. D doesn't have this particular problem because it is garbage collected and the source range would, thus, be kept alive in either case.

The above example also shows one of the problems with possibly lazy evaluated algorithm: since any type, including types which can't be spelled out otherwise, can be deduced by auto variables or templated functions, there is nothing forcing the lazy evaluation at the end of an expression. Thus, the results from the expression templates can be obtained and the algorithm isn't really executed. That is, if an l-value is passed to an algorithm, it needs to be made sure that the expression is actually evaluated to obtain the actual effect. For example, any sort() algorithm mutating the entire sequence clearly does the mutation in-place (if you want a version doesn't do it in-place just copy the container and apply the in-place version; if you only have a non-in-place version you can't avoid the extra sequence which may be an immediate problem, e.g., for gigantic sequences). Assuming it is lazy in some way the l-value access to the original sequence provides a peak into the current status which is almost certainly a bad thing. This may imply that lazy evaluation of mutating algorithms isn't such a great idea anyway.

In any case, there are some aspects of C++ which make it impossible to immediately adopt the D-sytle ranges although the same considerations also apply to other range abstractions. I'd think these considerations are, thus, somewhat out of scope for the question, too. Also, the obvious "solution" to the first of the problems (add garbage collection) is unlikely to happen. I don't know if there is a solution to the second problem in D. There may emerge a solution to the second problem (tentatively dubbed operator auto) but I'm not aware of a concrete proposal or how such a feature would actually look like.

BTW, the Ranges Study Group isn't really bogged down by any technical details. So far, we merely tried to find out what problems we are actually trying to solve and to scope out, to some extend, the solution space. Also, groups generally don't get any work done, at all! The actual work is always done by individuals, often by very few individuals. Since a major part of the work is actually designing a set of abstractions I would expect that the foundations of any results of the Ranges Study Group is done by 1 to 3 individuals who have some vision of what is needed and how it should look like.

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • +1 for the very detailed answer about your own experiences. Two points that spring to mind: 1) I thought that D ranges kept a reference to the input range as well as storing the composed function object of all the intermediate algorithms, so that the temporary problem doesn't exist. Or am I missing something? 2) AFAIK, in D calling result.front() or pop-front() would evaluate the stored function object on the current input element / find the next one. – TemplateRex Aug 31 '13 at 23:03
  • @TemplateRex: About your first point: I would assume that D ranges keep a reference to the original and that there is no problem with temporary input ranges in _D_ as they are kept around long enough. However, in C++ the same cannot be done because even if a reference is kept to input range, the object will be destroyed at the end of the expression. Unlike D, C++ doesn't have garbage collection. However, the question what the technical problems are in _C++_. I don't claim that the same problem do exist in D. – Dietmar Kühl Aug 31 '13 at 23:13
  • When I write generator iterators that do not know in advance if they end, my end iterator is just an empty iterator with a flag saying 'I am the omega': you do not need to find the end to have an end iterator. So I am not sure why `end()` based `for` loops is a problem? – Yakk - Adam Nevraumont Sep 01 '13 at 05:27
  • 1
    @Yakk The fact that you actually need iterators in the first place is the problem, not `end()` specifically. – Xeo Sep 01 '13 at 06:03
  • Not sure if temporaries in D live on the heap (I would be surprised if they were GC'ed), but in C++ you would do some rvalue specific stuff, agreed. – TemplateRex Sep 01 '13 at 06:24
  • The problem of temporaries exists for C++ ranges as well. In [Linq](http://pfultz2.github.io/Linq/), the equivalent(`auto result = std::vector{ read_integers() } | linq::orderby(linq::detail::identity_selector()) | linq::distinct;`) is a compiler error, since there really is not a good solution to this, other than require the user to pass an lvalue. Linq uses an [`is_bindable_range`](https://github.com/pfultz2/Linq/blob/master/linq/traits.h#L118) trait to detect if its an rvalue, but excludes ranges that only store iterators(such as `boost::iterator_range`) – Paul Fultz II Sep 01 '13 at 15:58
  • 1
    @LucDanton: The problem isn't so much returning passed through objects but rather efficiently dealing with clinging to suitable objects when necessary in the internal range objects used for lazy evaluation. Also note that I'm _not_ saying that dealing with temporaries isn't doable, just that it is a consideration which needs to be taken into account when dealing with ranges (at least in C++; I don't know if it is gracefully dealt with in D) and I don't think the solution will be entirely trivial. – Dietmar Kühl Sep 01 '13 at 17:42
  • @DietmarKühl regarding your "STL 2.0" document I think you should edit it for clarity, particularly the introduction. Each introductory paragraph should be followed by an example that shows precisely what problem exists. Right now it's not obvious what you're talking about. Also, the sentences that start with "The approach to..." should instead read "I suggest that to..." to make it clear that you are proposing a solution, rather than some other meaning (e.g. telling us how other people work around the problem). – Qwertie Oct 22 '13 at 17:43
  • @Qwertie: I know that the page needs a lot of work. I'm working on writing up proper proposals for the Ranges Study Group but they are not, yet, quite ready. Each one is meant on a specific change (the one which is most progressed is for using inhomogeneous types for begin and end of single-pass or multi-pass sequences). – Dietmar Kühl Oct 22 '13 at 21:27
  • Hi guys, my own limited experience suggests that ranges should be basically the same as iterators (a range is a single iterator that is smart enough to know its end). Therefore, the range/iterator is passed by value, while maintaining a reference to the original data. Temporaries can then be owned by the range. You might ask what happens if you copy such an "owning"-range. My rule is that owning ranges cannot be copied, as it would raise the question of whether to copy the underlying temporary. They can't be copied, but they can be moved, and therefore can be passed to functions such as `zip`. – Aaron McDaid Aug 16 '15 at 09:38
  • ... (continued), but I must admit that trying to do sort, in place, on such a temporary range wouldn't work in general. So maybe I have nothing to add after all :( – Aaron McDaid Aug 16 '15 at 10:37
8

My C++11 knowledge is much more limited than I'd like it to be, so there may be newer features which improve things that I'm not aware of yet, but there are three areas that I can think of at the moment which are at least problematic: template constraints, static if, and type introspection.

In D, a range-based function will usually have a template constraint on it indicating which type of ranges it accepts (e.g. forward range vs random-access range). For instance, here's a simplified signature for std.algorithm.sort:

auto sort(alias less = "a < b", Range)(Range r)
    if(isRandomAccessRange!Range &&
       hasSlicing!Range &&
       hasLength!Range)
{...}

It checks that the type being passed in is a random-access range, that it can be sliced, and that it has a length property. Any type which does not satisfy those requirements will not compile with sort, and when the template constraint fails, it makes it clear to the programmer why their type won't work with sort (rather than just giving a nasty compiler error from in the middle of the templated function when it fails to compile with the given type).

Now, while that may just seem like a usability improvement over just giving a compilation error when sort fails to compile because the type doesn't have the right operations, it actually has a large impact on function overloading as well as type introspection. For instance, here are two of std.algorithm.find's overloads:

R find(alias pred = "a == b", R, E)(R haystack, E needle)
    if(isInputRange!R &&
       is(typeof(binaryFun!pred(haystack.front, needle)) : bool))
{...}


R1 find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle)
    if(isForwardRange!R1 && isForwardRange!R2 &&
       is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool) &&
       !isRandomAccessRange!R1)
{...}

The first one accepts a needle which is only a single element, whereas the second accepts a needle which is a forward range. The two are able to have different parameter types based purely on the template constraints and can have drastically different code internally. Without something like template constraints, you can't have templated functions which are overloaded on attributes of their arguments (as opposed to being overloaded on the specific types themselves), which makes it much harder (if not impossible) to have different implementations based on the genre of range being used (e.g. input range vs forward range) or other attributes of the types being used. Some work has been being done in this area in C++ with concepts and similar ideas, but AFAIK, C++ is still seriously lacking in the features necessary to overload templates (be they templated functions or templated types) based on the attributes of their argument types rather than specializing on specific argument types (as occurs with template specialization).

A related feature would be static if. It's the same as if, except that its condition is evaluated at compile time, and whether it's true or false will actually determine which branch is compiled in as opposed to which branch is run. It allows you to branch code based on conditions known at compile time. e.g.

static if(isDynamicArray!T)
{}
else
{}

or

static if(isRandomAccessRange!Range)
{}
else static if(isBidirectionalRange!Range)
{}
else static if(isForwardRange!Range)
{}
else static if(isInputRange!Range)
{}
else
    static assert(0, Range.stringof ~ " is not a valid range!");

static if can to some extent obviate the need for template constraints, as you can essentially put the overloads for a templated function within a single function. e.g.

R find(alias pred = "a == b", R, E)(R haystack, E needle)
{
    static if(isInputRange!R &&
       is(typeof(binaryFun!pred(haystack.front, needle)) : bool))
    {...}
    else static if(isForwardRange!R1 && isForwardRange!R2 &&
       is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool) &&
       !isRandomAccessRange!R1)
    {...}
}

but that still results in nastier errors when compilation fails and actually makes it so that you can't overload the template (at least with D's implementation), because overloading is determined before the template is instantiated. So, you can use static if to specialize pieces of a template implementation, but it doesn't quite get you enough of what template constraints get you to not need template constraints (or something similar).

Rather, static if is excellent for doing stuff like specializing only a piece of your function's implementation or for making it so that a range type can properly inherit the attributes of the range type that it's wrapping. For instance, if you call std.algorithm.map on an array of integers, the resultant range can have slicing (because the source range does), whereas if you called map on a range which didn't have slicing (e.g. the ranges returned by std.algorithm.filter can't have slicing), then the resultant ranges won't have slicing. In order to do that, map uses static if to compile in opSlice only when the source range supports it. Currently, map 's code that does this looks like

static if (hasSlicing!R)
{
    static if (is(typeof(_input[ulong.max .. ulong.max])))
        private alias opSlice_t = ulong;
    else
        private alias opSlice_t = uint;

    static if (hasLength!R)
    {
        auto opSlice(opSlice_t low, opSlice_t high)
        {
            return typeof(this)(_input[low .. high]);
        }
    }
    else static if (is(typeof(_input[opSlice_t.max .. $])))
    {
        struct DollarToken{}
        enum opDollar = DollarToken.init;
        auto opSlice(opSlice_t low, DollarToken)
        {
            return typeof(this)(_input[low .. $]);
        }

        auto opSlice(opSlice_t low, opSlice_t high)
        {
            return this[low .. $].take(high - low);
        }
    }
}

This is code in the type definition of map's return type, and whether that code is compiled in or not depends entirely on the results of the static ifs, none of which could be replaced with template specializations based on specific types without having to write a new specialized template for map for every new type that you use with it (which obviously isn't tenable). In order to compile in code based on attributes of types rather than with specific types, you really need something like static if (which C++ does not currently have).

The third major item which C++ is lacking (and which I've more or less touched on throughout) is type introspection. The fact that you can do something like is(typeof(binaryFun!pred(haystack.front, needle)) : bool) or isForwardRange!Range is crucial. Without the ability to check whether a particular type has a particular set of attributes or that a particular piece of code compiles, you can't even write the conditions which template constraints and static if use. For instance, std.range.isInputRange looks something like this

template isInputRange(R)
{
    enum bool isInputRange = is(typeof(
    {
        R r = void;       // can define a range object
        if (r.empty) {}   // can test for empty
        r.popFront();     // can invoke popFront()
        auto h = r.front; // can get the front of the range
    }));
}

It checks that a particular piece of code compiles for the given type. If it does, then that type can be used as an input range. If it doesn't, then it can't. AFAIK, it's impossible to do anything even vaguely like this in C++. But to sanely implement ranges, you really need to be able to do stuff like have isInputRange or test whether a particular type compiles with sort - is(typeof(sort(myRange))). Without that, you can't specialize implementations based on what types of operations a particular range supports, you can't properly forward the attributes of a range when wrapping it (and range functions wrap their arguments in new ranges all the time), and you can't even properly protect your function against being compiled with types which won't work with it. And, of course, the results of static if and template constraints also affect the type introspection (as they affect what will and won't compile), so the three features are very much interconnected.

Really, the main reasons that ranges don't work very well in C++ are the some reasons that metaprogramming in C++ is primitive in comparison to metaprogramming in D. AFAIK, there's no reason that these features (or similar ones) couldn't be added to C++ and fix the problem, but until C++ has metaprogramming capabilities similar to those of D, ranges in C++ are going to be seriously impaired.

Other features such as mixins and Uniform Function Call Syntax would also help, but they're nowhere near as fundamental. Mixins would help primarily with reducing code duplication, and UFCS helps primarily with making it so that generic code can just call all functions as if they were member functions so that if a type happens to define a particular function (e.g. find) then that would be used instead of the more general, free function version (and the code still works if no such member function is declared, because then the free function is used). UFCS is not fundamentally required, and you could even go the opposite direction and favor free functions for everything (like C++11 did with begin and end), though to do that well, it essentially requires that the free functions be able to test for the existence of the member function and then call the member function internally rather than using their own implementations. So, again you need type introspection along with static if and/or template constraints.

As much as I love ranges, at this point, I've pretty much given up on attempting to do anything with them in C++, because the features to make them sane just aren't there. But if other folks can figure out how to do it, all the more power to them. Regardless of ranges though, I'd love to see C++ gain features such as template constraints, static if, and type introspection, because without them, metaprogramming is way less pleasant, to the point that while I do it all the time in D, I almost never do it in C++.

Community
  • 1
  • 1
Jonathan M Davis
  • 37,181
  • 17
  • 72
  • 102
  • 4
    Actually, none of the above are any obstacle for C++ using ranges! Although the technology used to implement similar features in C++ may be more involved, they do work and are used. – Dietmar Kühl Aug 31 '13 at 21:19
  • +1 for the very detailed answer that shows which D features are instrumental to its ranges/algorithms. However, given your explanation, I'm not sure whether C++ could not match D's range interface. For sure, sfinae is more verbose than `static if` but does it really make some valid D functionality impossible to mimic? And upcoming Concepts Lite should closely match your `IsInputRange` grouping of valid expressions. – TemplateRex Aug 31 '13 at 21:33
  • @DietmarKühl Could you expand your comment into an answer from the c++ perspective? I realize these inter-language questions are hard. – TemplateRex Aug 31 '13 at 21:37
  • I have to agree with Dietmar Kühl. I have implemented D-style ranges myself and the metaprogramming wasn't a problem, *at all*. (Although I'm in the camp that would rather keep using the kludges we have right now while waiting for concepts rather than bring in `static if` and the like into the language.) I do have some issues regarding code duplication, so that does strike home. Unfortunately I don't know enough about D mixins to tell whether they would help or not. – Luc Danton Aug 31 '13 at 21:54
  • 3
    Well, if someone has figured out how to work around the lack of template constraints, `static if`, and type introspection in C++ and implement usable ranges in C++, all the more power to them. I clearly don't have good enough C++ metaprogramming-foo to pull it off. But I'd be very surprised if it were done in a way that your average C++ programmer could write a range-based function, and unless ranges and range-based functions can be written by your average programmer, I seriously question that they have much viability. D metaprogramming is actually accessible to the average programmer. – Jonathan M Davis Aug 31 '13 at 23:51
  • 1
    For the record Boost.TypeTraits and Boost.CallTraits were released in 2000, `boost::enable_if` in 2004. I doubt they were the first efforts of the sort either. – Luc Danton Sep 01 '13 at 06:08