(Re)Using std::algorithms with non-standard containers

Question

I have a "column" container type:

struct MyColumnType { 
  // Data: Each row represents a member of an object.
  vector<double> a;   // All vectors are guaranteed to have always
  vector<string> b;   // the same length.
  vector<int> c;

  void copy(int from_pos, int to_pos); // The column type provides an interface
  void swap(int pos_a, int pos_b);     // for copying, swapping, ...

  void push_back();      // And for resizing the container.
  void pop_back();
  void insert(int pos);
  void remove(int pos);
  // The interface can be extended/modified if required
};

Usage:

// If table is a constructed container with elements stored 
// To acces the members of the object stored at the 4th position:
table.a[4] = 4.0;
table.b[4] = "4th";
table.c[4] = 4;

Question: How can I create a standard-compliant random access iterator (and probably a required proxy reference type) for this kind of container?

I want to be able to use std::algorithms for random access iterators with my type, e.g. sort (note: for sorting the comparison would be provided by an user-defined functor, e.g. a lambda).

In particular the iterator should provide an interface similar to

struct {
  double& a;
  string& b;
  int& c;
};

Note 0: C++11/C++14 is allowed.

Note 1: There is an old paper http://hci.iwr.uni-heidelberg.de/vigra/documents/DataAccessors.ps where a similar attempt is undertaken. However, I haven't been able to get their approach working with sort. Requirements like defaultConstructible are hard to satisfy using a proxy type approach (why does std::sort require types to be default constructible instead of swappable is beyond my understanding).

Note 2: I cannot do the following:

struct MyType {
  double a;
  string b;
  int c;
};

std::vector<MyType> v;

and then use std::algorithm.

Motivation: Performance. A cache-line is usually 64bytes, i.e. 8 doubles. In this simple struct if you iterate over the doubles, you are polluting a cache-line with a string an an int. In other cases, you might get only 1 double transfered per cache-line. That is, you end up using 1/8-th of the memory bandwith available. If you need to iterate over a couple of Gb of doubles, this simple decision improves your application performance by a factor of 6-7x. And no, I cannot give that up.

Bonus: the answer should be as generic as possible. Think about adding/removing fields to the container type as adding/removing members to a struct. You don't want to change a lot of code every time you add a new member.

Very similar question: http://stackoverflow.com/questions/839958/custom-iterator-in-c — jogojapan, Jun 01 '13 at 15:23
@jogojapan In that question he wants to go over his members sequentially (more like join/chaining of containers). Here it is more like a zip operation. I want to traverse the members sequentially, but at the same time. — gnzlbg, Jun 01 '13 at 15:30
Yes, but various tools that make the creation of new iterator types easier are listed there. They will be useful. — jogojapan, Jun 01 '13 at 15:31
@jogojapan I dont really think boost iterator facade was designed to solve this problem. The amount of hackery required probably will force you to define all of the iterator member functions and traits manually anyway. — gnzlbg, Jun 01 '13 at 15:43
How is your comparison function (e.g. `<`) for `MyColumnType` / an element of this container defined? You need a strict total ordering for `std::sort`. — dyp, Jun 03 '13 at 19:00
The motivation indicates you want to iterate e.g. over the `double`s only, and the other members are irrelevant? Or do you always need all elements (`double`, `string`, `int`) when iterating? — dyp, Jun 03 '13 at 19:09
Have you specialized `std::swap` when `std::sort` asked for default constructor? — maxim1000, Jun 03 '13 at 19:51
In C++11, `std::sort` does not require DefaultConstructible. — dyp, Jun 03 '13 at 20:00
@DyP the comparison function should be provided by the user using functor (e.g. a lambda). You can assume that a valid one is provided. — gnzlbg, Jun 03 '13 at 22:49
@maxim1000 yes I did that. The thing is, `std::sort` is not required to use `std::swap`. That is implementation defined. For example, for large arrays it might use quick sort (which might use `std::swap`), but for smaller ones it uses merge sort, and for very small ones insertion sort. Actually, for very large arrays libc++ uses quicksort recursively until the array size is under some threshold, and then merge sort/insertion sort are used. — gnzlbg, Jun 03 '13 at 22:51
@DyP Yes, in C++11 it requires at least MoveConstructible, although DefaultConstructible is also ok. — gnzlbg, Jun 03 '13 at 22:52
I've updated the question to specify an user-defined functor for the comparisons in the example of the sort algorithm. I've also added that C++11/14 are both allowed. And the bonus that one should think about the cost of adding/removing fields to the column type in the design of the iterator. — gnzlbg, Jun 03 '13 at 23:09

dyp · Accepted Answer · 2013-06-03T21:27:16.453

I think something like this could be Standard-compliant. It uses some C++11 features to simplify the syntax, but could as well be changed to comply C++03 AFAIK.

Tested and works with clang++3.2

Prelude:

#include <vector>
#include <string>
#include <utility>  // for std::swap
#include <iterator>

using std::vector;
using std::string;


// didn't want to insert all those types as nested classes of MyColumnType
namespace MyColumnType_iterator
{
    struct all_copy;
    struct all_reference;
    struct all_iterator;
}


// just provided `begin` and `end` member functions
struct MyColumnType {
    // Data: Each row represents a member of an object.
    vector<double> a;   // All vectors are guaranteed to have always
    vector<string> b;   // the same length.
    vector<int> c;

    void copy(int from_pos, int to_pos); // The column type provides an itface
    void swap(int pos_a, int pos_b);     // for copying, swapping, ...

    void push_back();      // And for resizing the container.
    void pop_back();
    void insert(int pos);
    void remove(int pos);
    // The interface can be extended/modified if required


    using iterator = MyColumnType_iterator::all_iterator;
    iterator begin();
    iterator end();
};

The iterator classes: a value_type (all_copy), a reference type (all_reference) and the iterator type (all_iterator). Iterating is done by keeping and updating three iterators (one to each vector). I don't know if that's the most performant option, though.

How it works: std::iterator_traits defines several associated types for an iterator: [iterator.traits]/1

iterator_traits<Iterator>::difference_type
iterator_traits<Iterator>::value_type
iterator_traits<Iterator>::iterator_category
be defined as the iterator’s difference type, value type and iterator category, respectively. In addition, the types
iterator_traits<Iterator>::reference
iterator_traits<Iterator>::pointer
shall be defined as the iterator’s reference and pointer types, that is, for an iterator object a, the same type as the type of *a and a->, respectively

Therefore, you can introduce a struct (all_reference) keeping three references as reference type. This type is the return value of *a, where a is of the iterator type (possibly const-qualified). There needs to be a different value_type because some Standard Library algorithms such as sort might want to create a local variable temporarily storing the value of *a (by copy or move into the local variable). In this case, all_copy provides this functionality.

You're not required to use it (all_copy) in you own loops, where it could affect performance.

namespace MyColumnType_iterator
{
    struct all_copy;

    struct all_reference
    {
        double& a;
        string& b;
        int& c;

        all_reference() = delete;
        // not required for std::sort, but stream output is simpler to write
        // with this
        all_reference(all_reference const&) = default;
        all_reference(double& pa, string& pb, int& pc)
            : a{pa}
            , b{pb}
            , c{pc}
        {}

        // MoveConstructible required for std::sort
        all_reference(all_reference&& other) = default;
        // MoveAssignable required for std::sort
        all_reference& operator= (all_reference&& other)
        {
            a = std::move(other.a);
            b = std::move(other.b);
            c = std::move(other.c);

            return *this;
        }

        // swappable required for std::sort
        friend void swap(all_reference p0, all_reference p1)
        {
            std::swap(p0.a, p1.a);
            std::swap(p0.b, p1.b);
            std::swap(p0.c, p1.c);
        }

        all_reference& operator= (all_copy const& p) = default;
        all_reference& operator= (all_copy&& p) = default;

        // strict total ordering required for std::sort
        friend bool operator< (all_reference const& lhs,
                               all_reference const& rhs);
        friend bool operator< (all_reference const& lhs, all_copy const& rhs);
        friend bool operator< (all_copy const& lhs, all_reference const& rhs);
    };

    struct all_copy
    {
        double a;
        string b;
        int c;

        all_copy(all_reference const& p)
            : a{p.a}
            , b{p.b}
            , c{p.c}
        {}
        all_copy(all_reference&& p)
            : a{ std::move(p.a) }
            , b{ std::move(p.b) }
            , c{ std::move(p.c) }
        {}
    };

There needs to be a comparison function for std::sort. For some reason we have to provide all three.

    bool operator< (all_reference const& lhs, all_reference const& rhs)
    {
        return lhs.c < rhs.c;
    }
    bool operator< (all_reference const& lhs, all_copy const& rhs)
    {
        return lhs.c < rhs.c;
    }
    bool operator< (all_copy const& lhs, all_reference const& rhs)
    {
        return lhs.c < rhs.c;
    }

Now, the iterator class:

    struct all_iterator
        : public std::iterator < std::random_access_iterator_tag, all_copy >
    {
        //+ specific to implementation
        private:
            using ItA = std::vector<double>::iterator;
            using ItB = std::vector<std::string>::iterator;
            using ItC = std::vector<int>::iterator;
            ItA iA;
            ItB iB;
            ItC iC;

        public:
            all_iterator(ItA a, ItB b, ItC c)
                : iA(a)
                , iB(b)
                , iC(c)
            {}
        //- specific to implementation


        //+ for iterator_traits
            using reference = all_reference;
            using pointer = all_reference;
        //- for iterator_traits


        //+ iterator requirement [iterator.iterators]/1
            all_iterator(all_iterator const&) = default;            // CopyConstructible
            all_iterator& operator=(all_iterator const&) = default; // CopyAssignable
            ~all_iterator() = default;                              // Destructible

            void swap(all_iterator& other)                          // lvalues are swappable
            {
                std::swap(iA, other.iA);
                std::swap(iB, other.iB);
                std::swap(iC, other.iC);
            }
        //- iterator requirements [iterator.iterators]/1
        //+ iterator requirement [iterator.iterators]/2
            all_reference operator*()
            {
                return {*iA, *iB, *iC};
            }
            all_iterator& operator++()
            {
                ++iA;
                ++iB;
                ++iC;
                return *this;
            }
        //- iterator requirement [iterator.iterators]/2

        //+ input iterator requirements [input.iterators]/1
            bool operator==(all_iterator const& other) const        // EqualityComparable
            {
                return iA == other.iA;  // should be sufficient (?)
            }
        //- input iterator requirements [input.iterators]/1
        //+ input iterator requirements [input.iterators]/2
            bool operator!=(all_iterator const& other) const        // "UnEqualityComparable"
            {
                return iA != other.iA;  // should be sufficient (?)
            }

            all_reference const operator*() const                   // *a
            {
                return {*iA, *iB, *iC};
            }

            all_reference operator->()                              // a->m
            {
                return {*iA, *iB, *iC};
            }
            all_reference const operator->() const                  // a->m
            {
                return {*iA, *iB, *iC};
            }

            // ++r already satisfied

            all_iterator operator++(int)                            // *++r
            {
                all_iterator temp(*this);
                ++(*this);
                return temp;
            }
        //- input iterator requirements [input.iterators]/2

        //+ output iterator requirements [output.iterators]/1
            // *r = o already satisfied
            // ++r already satisfied
            // r++ already satisfied
            // *r++ = o already satisfied
        //- output iterator requirements [output.iterators]/1

        //+ forward iterator requirements [forward.iterators]/1
            all_iterator() = default;                               // DefaultConstructible
            // r++ already satisfied
            // *r++ already satisfied
            // multi-pass must be guaranteed
        //- forward iterator requirements [forward.iterators]/1

        //+ bidirectional iterator requirements [bidirectional.iterators]/1
            all_iterator& operator--()                              // --r
            {
                --iA;
                --iB;
                --iC;
                return *this;
            }
            all_iterator operator--(int)                            // r--
            {
                all_iterator temp(*this);
                --(*this);
                return temp;
            }
            // *r-- already satisfied
        //- bidirectional iterator requirements [bidirectional.iterators]/1

        //+ random access iterator requirements [random.access.iterators]/1
            all_iterator& operator+=(difference_type p)             // r += n
            {
                iA += p;
                iB += p;
                iC += p;
                return *this;
            }
            all_iterator operator+(difference_type p) const         // a + n
            {
                all_iterator temp(*this);
                temp += p;
                return temp;
            }
            // doesn't have to be a friend function, but this way,
            // we can define it here
            friend all_iterator operator+(difference_type p,
                                         all_iterator temp)         // n + a
            {
                temp += p;
                return temp;
            }

            all_iterator& operator-=(difference_type p)             // r -= n
            {
                iA -= p;
                iB -= p;
                iC -= p;
                return *this;
            }
            all_iterator operator-(difference_type p) const         // a - n
            {
                all_iterator temp(*this);
                temp -= p;
                return temp;
            }

            difference_type operator-(all_iterator const& p)        // b - a
            {
                return iA - p.iA;   // should be sufficient (?)
            }

            all_reference operator[](difference_type p)             // a[n]
            {
                return *(*this + p);
            }
            all_reference const operator[](difference_type p) const // a[n]
            {
                return *(*this + p);
            }

            bool operator<(all_iterator const& p) const             // a < b
            {
                return iA < p.iA;   // should be sufficient (?)
            }
            bool operator>(all_iterator const& p) const             // a > b
            {
                return iA > p.iA;   // should be sufficient (?)
            }
            bool operator>=(all_iterator const& p) const            // a >= b
            {
                return iA >= p.iA;  // should be sufficient (?)
            }
            bool operator<=(all_iterator const& p) const            // a >= b
            {
                return iA <= p.iA;  // should be sufficient (?)
            }
        //- random access iterator requirements [random.access.iterators]/1
    };
}//- namespace MyColumnType_iterator


MyColumnType::iterator MyColumnType::begin()
{
    return { a.begin(), b.begin(), c.begin() };
}
MyColumnType::iterator MyColumnType::end()
{
    return { a.end(), b.end(), c.end() };
}

Usage example:

#include <iostream>
#include <cstddef>
#include <algorithm>


namespace MyColumnType_iterator
{
    template < typename char_type, typename char_traits >
    std::basic_ostream < char_type, char_traits >&
    operator<< (std::basic_ostream < char_type, char_traits >& o,
                std::iterator_traits<MyColumnType::iterator>::reference p)
    {
        return o << p.a << ";" << p.b << ";" << p.c;
    }
}

int main()
{
    using std::cout;

    MyColumnType mct =
    {
          {1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1}
        , {"j", "i", "h", "g", "f", "e", "d", "c", "b", "a"}
        , {10,    9,   8,   7,   6,   5,   4,   3,   2,   1}
    };

    using ref = std::iterator_traits<MyColumnType::iterator>::reference;
    std::copy(mct.begin(), mct.end(), std::ostream_iterator<ref>(cout, ", "));
    std::cout << std::endl;

    std::sort(mct.begin(), mct.end());
    std::copy(mct.begin(), mct.end(), std::ostream_iterator<ref>(cout, ", "));
    std::cout << std::endl;
}

Output:

1;j;10, 0.9;i;9, 0.8;h;8, 0.7;g;7, 0.6;f;6, 0.5;e;5, 0.4;d;4, 0.3;c;3, 0.2;b;2, 0.1;a;1,
0.1;a;1, 0.2;b;2, 0.3;c;3, 0.4;d;4, 0.5;e;5, 0.6;f;6, 0.7;g;7, 0.8;h;8, 0.9;i;9, 1;j;10,

In an effort to simplify your design, I ended up with something almost identical. :( I don't think it gets much simpler than what you have. http://coliru.stacked-crooked.com/view?id=914e2aa9209e363c1c0ab2b703990b7a-e54ee7a04e4b807da0930236d4cc94dc — Mooing Duck, Jun 03 '13 at 22:28
@MooingDuck Yeah I also though about the pointer+int as iterator, but I remembered some compilers sometimes have issues optimizing that in loops (`pointer[int]` might be slower than `*pointer; ++pointer` because of optimizations around that code). E.g. with a simple vector on MSVC10 or boost matrices. Here it's probably a trade-off of fast incrementation vs. fast dereferencing. — dyp, Jun 03 '13 at 22:38
@DyP I like your answer. I'll try it tomorrow. Two things worry me. First, if I add a new field to my column type, I have to change _a lot_ of code. I'll think about making it more generic. Then there are all those comparison functions. Have you tested it by passing an user-defined comparison e.g. via a lambda? Defining all those comparison functions seems strange (I also don't know why they are required). In case a C++11 lambda doesn't work, a C++14 lambda might work. I'll test that tomorrow too. Finally, thanks for such a great answer! I really appreciate it! — gnzlbg, Jun 03 '13 at 23:04
@gnzlbg You could use a tuple and some generic programming to simplify changing the fields of the column type, yes. Mooing Duck's approach using only one iterator also has much less trouble when changing those fields. The comparison functions are weird, indeed. They're required because `std::sort` has to compare `*a < *b`, but I don't know exactly what's the problem with providing only one (relying on implicit conversions). — dyp, Jun 03 '13 at 23:08
@DyP: Yeah, my iterators are probably slower for deferencing, but they're smaller and advance faster. I avoided looking at your code to try to see if I came up with something different, and haven't yet compared them. Its interesting that you have three `operator<`, implicit conversions worked with my comparison. — Mooing Duck, Jun 03 '13 at 23:13
@DyP I've experimented with the tuple approach but then you loose the names in the iterator, i.e. you'll use get<0> for a, get<1> for b, etc... the alternative is to use a boost::fusion::map for naming the elements of the "tuple"... I end up writing this type of containers fairly often and haven't really found a generic, maintainable solution with minimum boiler plate. Your answer is pretty nice tho, i'll try it tomorrow. — gnzlbg, Jun 03 '13 at 23:15
@MooingDuck I've accepted this as answer but at the end decided to follow an approach very similar to yours because with two Facade CRTP classes for value and reference the user can specify their own Value/Reference types with minimum boilerplate. If there is interest in hacking together on a general solution to this problem I can set up a github repository. I think the scope of the problem is large enough that a library for removing boilerplate would be generally useful (one could even think of boost review at some point). — gnzlbg, Jun 06 '13 at 09:52
@gnzlbg I'm interested in your solution and would like to take a look at a repo in order to see if I might contribute to reducing complexity of changing the column type. — dyp, Jun 06 '13 at 14:38
@DyP Ok, I'll let you know as soon as I commit it. I should have time for it this weekend. — gnzlbg, Jun 10 '13 at 12:49

willj · Answer 2 · 2013-06-04T09:26:27.860

0

If you're really concerned about performance and you want to sort your container with std::sort, use the overload that allows you to provide a custom comparison object:

template <class RandomAccessIterator, class Compare>
void sort (RandomAccessIterator first, RandomAccessIterator last, Compare comp);

.. and sort an array of indices into the container. Here's how:

You'll need the following members in your container:

struct MyColumnType { 
    ...

    int size() const;

    // swaps columns
    void swap(int l, int r);

    // returns true if column l is less than column r
    bool less(int l, int r) const;

    ...
};

Then define the following comparison object:

struct MyColumnTypeLess
{
    const MyColumnType* container;
    MyColumnTypeLess(const MyColumnType* container)
        : container(container)
    {
    }
    bool operator()(int l, int r) const
    {
        return container->less(l, r);
    }
};

And use it to sort an array of indices:

void sortMyColumnType(MyColumnType& container)
{
    std::vector<int> indices;
    indices.reserve(container.size());
    // fill with [0, n)
    for(int i = 0; i != container.size(); ++i)
    {
        indices.push_back(i);
    }
    // sort the indices
    std::sort(indices.begin(), indices.end(), MyColumnTypeLess(&container));
}

The 'less' member of the container controls which order to sort in:

bool MyColumnType::less(int l, int r) const
{
    // sort first by a, then b, then c
    return a[l] != a[r] ? a[l] < a[r]
        : b[l] != b[r] ? b[l] < b[r]
        : c[l] < c[r];
}

The sorted array of indices can be used in further algorithms - you can avoid copying the actual data around until you need to.

All std algorithms that work with RandomAccessIterators have overloads that allow you to specify custom comparison objects, so they can also be used with this technique.

edited Jun 04 '13 at 09:26

answered Jun 03 '13 at 21:46

willj

2,991
12
24

1

This does not answer the question which is how to use _all_ algorithms that work on random access iterators. Furthermore, this requires `O(N)` extra storage. In-place sorting (as specified for `std::sort`) is defined as sorting with less than `O(N)` extra storage (e.g. `O(log N)` would be allowed). So note that what you propose allows you to use the `std::sort` function, but without the guarantees provided by it if you consider the whole container as the thing being sorted, and not just the temporary array of indices that you just created. – gnzlbg Jun 03 '13 at 22:45
To find the real cost of `std::sort` with your container, you must multiply the number of operations it performs by the cost of those operations. Depending on your standard library implementation, `std::sort` may end up copying elements - copying `std::string` is an order of magnitude more expensive than copying `int`. If you're as concerned about performance as you say, you would be unwise to use algorithms such as `std::sort` directly on a container with very complex elements. – willj Jun 04 '13 at 09:24
You are right in that copying strings is more expensive than copying ints. They are not much more expensive to copy if your strings are small because of SSO. However, most HPC applications don't need them, but need other complex type (mine needs Eigen3 matrices which have good locality). I would recommend to rethink your design if you have strings, or to replace them with a stack allocated string if you know the maximum size. If you don't know a maximum size and you really need them, you will have to live with it. However they will cause random access to memory killing your performance. – gnzlbg Jun 04 '13 at 10:33
However, you are not really right about the cost. The best way to reduce the real cost, is to have a predictable memory access patter (advance linear in memory) and perfect spatial and temporal memory locality. Your approach has very bad temporal locality and requires extra bandwithd for the array of ints. Worst case you need to move the whole container plus an array of ints twice from main memory to the CPU: the first time to permute the int array, the second time to apply those permutations to the rest of the elements. Sort copies elements close to each other (probably in cache) -> is fast! – gnzlbg Jun 04 '13 at 10:35
Elaborating on the worst case: the first time you need to permute the array of int. And the comparison function might need to access all elements of the container. The second time you permute the container, and your permutation has to access the permuted array of ints. Those two passes are the performance killer. – gnzlbg Jun 04 '13 at 10:39
Note also that if your container is small (such that everything fits in cache), your approach _might_ be actually faster, because it requires less copying of more complex types. It might be slower, because it requires two passes. Just remember that if your container is large enough it will be slower. And that if you have trivially copyable types, it will also be slower. However, the case for which is faster might exist. If you have such a case, its a hard trade-off decision to make (flexibility, maintainability, scalability,...). – gnzlbg Jun 04 '13 at 10:42
If you're really concerned with the real world performance of a specific algorithm with a specific element type, then you can profile the performance and specialise the algorithm for that specific case. The standard library algorithms are by nature generic, and while many implementations do a very good job of automatically specialising themselves for different scenarios, a generic implementation may not be optimal for your use case. The specification for `std::sort` provides no guarantee that an implementation has good temporal or spatial locality. – willj Jun 04 '13 at 10:46
There is no such requirement, but reimplementing sort yourself and getting better performance than libstdc++ and libc++ is _really_ hard. However, duplicating the memory bandwith is unnecessary overhead. – gnzlbg Jun 04 '13 at 11:07
The trade off between performance and rolling your own implementation is usually not big enough to warrant the former ;). In the end, your choice of implementation is heavily dependent on the size of your container and the size/types of its elements - so a single generic solution may not be what you need. – willj Jun 04 '13 at 11:28
Indeed, i was just wishing there was something better :( – gnzlbg Jun 04 '13 at 11:30
One way to optimise your sort algorithm for this specific container, in a scenario where you don't have many duplicate values: First sort the container by comparing only the 'primary' key. Now you have grouped together all subsequences that share a primary key; move on to the secondary key and (modifying only the latter two arrays) sort each subsequence that has more than one element. Repeat for the last array. – willj Jun 04 '13 at 11:58

(Re)Using std::algorithms with non-standard containers

2 Answers2

Linked